System Info
transformers version: 4.44.2
- Platform: Linux-6.8.0-44-generic-x86_64-with-glibc2.39
- Python version: 3.12.3
- Huggingface_hub version: 0.24.7
- Safetensors version: 0.4.5
- Accelerate version: not installed
- Accelerate config: not found
- PyTorch version (GPU?): 2.4.1+cu121 (False)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using distributed or parallel set-up in script?: No
Who can help?
speech models: @ylacombe, @eustlb
pipelines: @Rocketknight1
Information
Tasks
Reproduction
import torch
from transformers import pipeline
pipe = pipeline(
"automatic-speech-recognition",
model="openai/whisper-base.en",
device="cpu",
torch_dtype=torch.float32,
)
# https://github.com/openai/whisper/blob/main/tests/jfk.flac
pipe("./jfk.flac")
Expected behavior
This does return the expected:
{'text': ' And so my fellow Americans ask not what your country can do for you, ask what you can do for your country.'}
But it also prints the following, so would be nice to fix/suppress:
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Thanks!