Avoid extra chunk in speech recognition#29539
Conversation
amyeroberts
left a comment
There was a problem hiding this comment.
Thanks for working on this!
The change looks reasonable to me, but let's get a second review from @sanchit-gandhi to confirm the desired behaviour here
ArthurZucker
left a comment
There was a problem hiding this comment.
LGTM, did you run the slow whisper tests? Pipeline + model? 🤗
|
@ArthurZucker I run the slow pipeline tests with |
|
If you can run the |
|
I'm running on mac CPU. The whisper tests took a while, there is a number of failures, but I don't expect any of them to be related, perhaps that's because the assertions are for GPU results and sometimes there are precision differences? |
|
I'll run them on GPU just to be sure! |
|
Thanks! FTR I now run on main and got the same failures :) |
|
Could you also rebase on the main branch I have a lot of failing tests: FAILED tests/models/whisper/test_feature_extraction_whisper.py::WhisperFeatureExtractionTest::test_numpy_integration - datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
FAILED tests/models/whisper/test_feature_extraction_whisper.py::WhisperFeatureExtractionTest::test_torch_integration - datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
FAILED tests/models/whisper/test_feature_extraction_whisper.py::WhisperFeatureExtractionTest::test_zero_mean_unit_variance_normalization_trunc_np_longest - datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelTest::test_flash_attn_2_from_config - RuntimeError: FlashAttention only support fp16 and bf16 data type
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelTest::test_flash_attn_2_generate_left_padding - RuntimeError: FlashAttention only support fp16 and bf16 data type
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelTest::test_flash_attn_2_generate_padding_right - RuntimeError: FlashAttention only support fp16 and bf16 data type
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelTest::test_flash_attn_2_generate_use_cache - RuntimeError: FlashAttention only support fp16 and bf16 data type
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelTest::test_flash_attn_2_inference - RuntimeError: FlashAttention only support fp16 and bf16 data type
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelTest::test_flash_attn_2_inference_padding_right - RuntimeError: FlashAttention only support fp16 and bf16 data type
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelTest::test_generate_longform_with_prompt_ids - IndexError: index -1 is out of bounds for dimension 0 with size 0
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_generate_with_prompt_ids - datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_generate_with_prompt_ids_and_forced_decoder_ids - datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_generate_with_prompt_ids_and_no_non_prompt_forced_decoder_ids - datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_language_detection - datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_large_batched_generation - datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_large_generation - datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_large_generation_multilingual - huggingface_hub.utils._errors.GatedRepoError: 403 Client Error. (Request ID: Root=1-660397cf-42f5812e3b4c73a62732db88;cde84af6-e519-411b-83f4-394a7f8b638d)
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_large_logits_librispeech - datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_small_en_logits_librispeech - datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_speculative_decoding_distil - datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_speculative_decoding_non_distil - datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_tiny_en_batched_generation - datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_tiny_en_generation - datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_tiny_generation - datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_tiny_logits_librispeech - datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_tiny_specaugment_librispeech - datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_tiny_timestamp_generation - datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_tiny_token_timestamp_batch_generation - datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_tiny_token_timestamp_generation - datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_tiny_token_timestamp_generation_longform - datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_whisper_longform_multi_batch - datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_whisper_longform_multi_batch_hard - ImportError: To support decoding audio files, please install 'librosa' and 'soundfile'.
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_whisper_longform_multi_batch_hard_prev_cond - ImportError: To support decoding audio files, please install 'librosa' and 'soundfile'.
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_whisper_longform_multi_batch_prev_cond - datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_whisper_longform_prompt_ids - datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_whisper_longform_single_batch - datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_whisper_longform_single_batch_prev_cond - datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperEncoderModelTest::test_flash_attn_2_inference - RuntimeError: FlashAttention only support fp16 and bf16 data type
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperEncoderModelTest::test_flash_attn_2_inference_padding_right - RuntimeError: FlashAttention only support fp16 and bf16 data type
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperStandaloneDecoderModelTest::test_flash_attn_2_from_config - RuntimeError: FlashAttention only support fp16 and bf16 data type
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperStandaloneDecoderModelTest::test_flash_attn_2_generate_left_padding - RuntimeError: FlashAttention only support fp16 and bf16 data type
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperStandaloneDecoderModelTest::test_flash_attn_2_generate_padding_right - RuntimeError: FlashAttention only support fp16 and bf16 data type
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperStandaloneDecoderModelTest::test_flash_attn_2_generate_use_cache - RuntimeError: FlashAttention only support fp16 and bf16 data type
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperStandaloneDecoderModelTest::test_flash_attn_2_inference - RuntimeError: FlashAttention only support fp16 and bf16 data type
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperStandaloneDecoderModelTest::test_flash_attn_2_inference_padding_right - RuntimeError: FlashAttention only support fp16 and bf16 data type
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperStandaloneDecoderModelTest::test_model_parallel_beam_search - RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1!
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperStandaloneDecoderModelTest::test_model_parallelism - RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1!
========================================================================================== 47 failed, 419 passed, 236 skipped, 163 warnings in 698.76s (0:11:38) =========================================================================================== |
697eeb4 to
a7ccb75
Compare
|
Done! |
|
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
|
Up :) |
|
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
|
@ArthurZucker kindly ping :D |
There was a problem hiding this comment.
Awesome work, thanks @jonatanklosko! Note that this PR only changes the pipeline, not the Whisper model class. So given @jonatanklosko has confirmed these slow tests pass, this is good to merge!
I was confused by this conditional before, but now I'm revisiting some logic and I am more convinced that it is not necessary.
I think it's clear if we look at the test I changed. Consider we use chunk length 100, left context 20, right context 10. If the input has length 100, the current logic returns two chunks with lengths 100 and 30 respectively. However, the input fits perfectly as a single chunk, so I don't see a reason why using two chunks would be helpful. Now, the conditional really only makes an off-by-1 distinction, so if the input had length 99, the current logic does return a single chunk (which makes sense). From my understanding, it only makes sense to do two chunks if the input has length 101, since it inherently does not fit.
For more context see the PR that introduces it #21612. I believe the actual fix (for the linked issue) in that PR is the missing
if is_last: break. I'm guessing the condition was introduces to make the existing test pass, and I think it's the test that was wrong.I run
RUN_SLOW=1 pytest tests/pipelines/test_pipelines_automatic_speech_recognition.pylocally and it passed.cc @ArthurZucker