Skip to content

Avoid extra chunk in speech recognition#29539

Merged
sanchit-gandhi merged 1 commit into
huggingface:mainfrom
jonatanklosko:jk-whisper-chunking
May 22, 2024
Merged

Avoid extra chunk in speech recognition#29539
sanchit-gandhi merged 1 commit into
huggingface:mainfrom
jonatanklosko:jk-whisper-chunking

Conversation

@jonatanklosko
Copy link
Copy Markdown
Contributor

I was confused by this conditional before, but now I'm revisiting some logic and I am more convinced that it is not necessary.

I think it's clear if we look at the test I changed. Consider we use chunk length 100, left context 20, right context 10. If the input has length 100, the current logic returns two chunks with lengths 100 and 30 respectively. However, the input fits perfectly as a single chunk, so I don't see a reason why using two chunks would be helpful. Now, the conditional really only makes an off-by-1 distinction, so if the input had length 99, the current logic does return a single chunk (which makes sense). From my understanding, it only makes sense to do two chunks if the input has length 101, since it inherently does not fit.

For more context see the PR that introduces it #21612. I believe the actual fix (for the linked issue) in that PR is the missing if is_last: break. I'm guessing the condition was introduces to make the existing test pass, and I think it's the test that was wrong.

I run RUN_SLOW=1 pytest tests/pipelines/test_pipelines_automatic_speech_recognition.py locally and it passed.

cc @ArthurZucker

Copy link
Copy Markdown
Contributor

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this!

The change looks reasonable to me, but let's get a second review from @sanchit-gandhi to confirm the desired behaviour here

Copy link
Copy Markdown
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, did you run the slow whisper tests? Pipeline + model? 🤗

@jonatanklosko
Copy link
Copy Markdown
Contributor Author

@ArthurZucker I run the slow pipeline tests with RUN_SLOW=1 pytest tests/pipelines/test_pipelines_automatic_speech_recognition.py, since the change is specific to that pipeline. If there are other relevant tests let me know which :)

@ArthurZucker
Copy link
Copy Markdown
Collaborator

ArthurZucker commented Mar 25, 2024

If you can run the whisper slow test would be amazing! RUN_SLOW=1 pytest tests/models/whisper.
Are you running this on GPU?

@jonatanklosko
Copy link
Copy Markdown
Contributor Author

I'm running on mac CPU. The whisper tests took a while, there is a number of failures, but I don't expect any of them to be related, perhaps that's because the assertions are for GPU results and sometimes there are precision differences?

@ArthurZucker
Copy link
Copy Markdown
Collaborator

I'll run them on GPU just to be sure!

@jonatanklosko
Copy link
Copy Markdown
Contributor Author

Thanks! FTR I now run on main and got the same failures :)

@ArthurZucker
Copy link
Copy Markdown
Collaborator

Could you also rebase on the main branch I have a lot of failing tests:

FAILED tests/models/whisper/test_feature_extraction_whisper.py::WhisperFeatureExtractionTest::test_numpy_integration - datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
FAILED tests/models/whisper/test_feature_extraction_whisper.py::WhisperFeatureExtractionTest::test_torch_integration - datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
FAILED tests/models/whisper/test_feature_extraction_whisper.py::WhisperFeatureExtractionTest::test_zero_mean_unit_variance_normalization_trunc_np_longest - datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelTest::test_flash_attn_2_from_config - RuntimeError: FlashAttention only support fp16 and bf16 data type
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelTest::test_flash_attn_2_generate_left_padding - RuntimeError: FlashAttention only support fp16 and bf16 data type
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelTest::test_flash_attn_2_generate_padding_right - RuntimeError: FlashAttention only support fp16 and bf16 data type
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelTest::test_flash_attn_2_generate_use_cache - RuntimeError: FlashAttention only support fp16 and bf16 data type
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelTest::test_flash_attn_2_inference - RuntimeError: FlashAttention only support fp16 and bf16 data type
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelTest::test_flash_attn_2_inference_padding_right - RuntimeError: FlashAttention only support fp16 and bf16 data type
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelTest::test_generate_longform_with_prompt_ids - IndexError: index -1 is out of bounds for dimension 0 with size 0
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_generate_with_prompt_ids - datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_generate_with_prompt_ids_and_forced_decoder_ids - datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_generate_with_prompt_ids_and_no_non_prompt_forced_decoder_ids - datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_language_detection - datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_large_batched_generation - datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_large_generation - datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_large_generation_multilingual - huggingface_hub.utils._errors.GatedRepoError: 403 Client Error. (Request ID: Root=1-660397cf-42f5812e3b4c73a62732db88;cde84af6-e519-411b-83f4-394a7f8b638d)
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_large_logits_librispeech - datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_small_en_logits_librispeech - datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_speculative_decoding_distil - datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_speculative_decoding_non_distil - datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_tiny_en_batched_generation - datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_tiny_en_generation - datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_tiny_generation - datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_tiny_logits_librispeech - datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_tiny_specaugment_librispeech - datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_tiny_timestamp_generation - datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_tiny_token_timestamp_batch_generation - datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_tiny_token_timestamp_generation - datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_tiny_token_timestamp_generation_longform - datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_whisper_longform_multi_batch - datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_whisper_longform_multi_batch_hard - ImportError: To support decoding audio files, please install 'librosa' and 'soundfile'.
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_whisper_longform_multi_batch_hard_prev_cond - ImportError: To support decoding audio files, please install 'librosa' and 'soundfile'.
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_whisper_longform_multi_batch_prev_cond - datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_whisper_longform_prompt_ids - datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_whisper_longform_single_batch - datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_whisper_longform_single_batch_prev_cond - datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperEncoderModelTest::test_flash_attn_2_inference - RuntimeError: FlashAttention only support fp16 and bf16 data type
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperEncoderModelTest::test_flash_attn_2_inference_padding_right - RuntimeError: FlashAttention only support fp16 and bf16 data type
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperStandaloneDecoderModelTest::test_flash_attn_2_from_config - RuntimeError: FlashAttention only support fp16 and bf16 data type
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperStandaloneDecoderModelTest::test_flash_attn_2_generate_left_padding - RuntimeError: FlashAttention only support fp16 and bf16 data type
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperStandaloneDecoderModelTest::test_flash_attn_2_generate_padding_right - RuntimeError: FlashAttention only support fp16 and bf16 data type
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperStandaloneDecoderModelTest::test_flash_attn_2_generate_use_cache - RuntimeError: FlashAttention only support fp16 and bf16 data type
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperStandaloneDecoderModelTest::test_flash_attn_2_inference - RuntimeError: FlashAttention only support fp16 and bf16 data type
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperStandaloneDecoderModelTest::test_flash_attn_2_inference_padding_right - RuntimeError: FlashAttention only support fp16 and bf16 data type
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperStandaloneDecoderModelTest::test_model_parallel_beam_search - RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1!
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperStandaloneDecoderModelTest::test_model_parallelism - RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1!
========================================================================================== 47 failed, 419 passed, 236 skipped, 163 warnings in 698.76s (0:11:38) ===========================================================================================

@jonatanklosko
Copy link
Copy Markdown
Contributor Author

Done!

@github-actions
Copy link
Copy Markdown
Contributor

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@jonatanklosko
Copy link
Copy Markdown
Contributor Author

Up :)

@github-actions
Copy link
Copy Markdown
Contributor

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@jonatanklosko
Copy link
Copy Markdown
Contributor Author

@ArthurZucker kindly ping :D

Copy link
Copy Markdown
Contributor

@sanchit-gandhi sanchit-gandhi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work, thanks @jonatanklosko! Note that this PR only changes the pipeline, not the Whisper model class. So given @jonatanklosko has confirmed these slow tests pass, this is good to merge!

@sanchit-gandhi sanchit-gandhi merged commit 1518508 into huggingface:main May 22, 2024
@jonatanklosko jonatanklosko deleted the jk-whisper-chunking branch May 22, 2024 13:30
itazap pushed a commit that referenced this pull request May 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants