Skip to content

Conversation

@Rocketknight1
Copy link
Member

cc @sanchit-gandhi - this PR just sets return_attention_mask=True on the preprocessors in the automatic_speech_recognition pipeline to avoid warnings caused by missing attention masks. It works okay in my testing, but please let me know if you think it could cause any problems!

Fixes #33498

@Rocketknight1 Rocketknight1 force-pushed the return_attention_mask_in_asr branch from 8d0d13b to 3e7b53e Compare September 16, 2024 13:28
@Vaibhavs10
Copy link
Contributor

cc: @ylacombe here

@ylacombe
Copy link
Contributor

Hey @Rocketknight1, thanks for opening the PR!
In theory, I don't see any problems with this fix, have you been able to run every slow ASR pipeline tests here ?

@Rocketknight1
Copy link
Member Author

@ylacombe there are some slow tests that I can't get working on my local machine (even on main). However, of the tests that run, they all work with this PR as well!

@ylacombe
Copy link
Contributor

Let me know if you want me to run them!

@Rocketknight1
Copy link
Member Author

@ylacombe Sure!

Copy link
Contributor

@ylacombe ylacombe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I've first opened a PR to fix some of the slow tests that were not passing due to how data was loaded #33545.
Your PR doesn't add any failing tests as compared to main and the changes make sense, so I think it should be ok to merge !

@Rocketknight1
Copy link
Member Author

Okay, cool! cc @LysandreJik for core maintainer review.

Copy link
Member

@LysandreJik LysandreJik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @Rocketknight1!

@Rocketknight1 Rocketknight1 merged commit 8efc06e into main Sep 18, 2024
@Rocketknight1 Rocketknight1 deleted the return_attention_mask_in_asr branch September 18, 2024 14:57
@monica-sekoyan
Copy link

Hi @Rocketknight1,
I think return_attention_mask should also be added to chunk_iter function, so the warning will be removed even when we specify chunk_length_s.

@Holmes-GU
Copy link

cc @sanchit-gandhi - this PR just sets return_attention_mask=True on the preprocessors in the automatic_speech_recognition pipeline to avoid warnings caused by missing attention masks. It works okay in my testing, but please let me know if you think it could cause any problems!

Fixes #33498

Hi, where can I set return_attention_mask=True?

BernardZach pushed a commit to BernardZach/transformers that referenced this pull request Dec 5, 2024
@AlessandroSpallina
Copy link

I'm having the error fixed here using latest transformers==4.47.1:

The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Whisper did not predict an ending timestamp, which can happen if audio is cut off in the middle of a word. Also make sure WhisperTimeStampLogitsProcessor was used during generation.

when running this code:

        pipe = pipeline(
            "automatic-speech-recognition",
            model=self.model,
            torch_dtype=torch.float16,
            chunk_length_s=30,
            batch_size=24,
            return_timestamps=True,
            device=self.device,
            tokenizer=processor.tokenizer,
            feature_extractor=processor.feature_extractor,
            model_kwargs={"use_flash_attention_2": True},
            generate_kwargs={
                "max_new_tokens": 128,
            },
        )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

8 participants