Replies: 10 comments 14 replies
-
I haven't tried it myself. Perhaps you could have two different whisper python codes running simultaneously. One detecting French and one detecting English. Afterwords run both transcripts in some grammar detector, that removes wrong grammars and spellings (i.e. the respectively wrong language). Then merge both text files. On the other hand there must be a python library that detects voice signatures out there. It sounds like a lot of work though. |
Beta Was this translation helpful? Give feedback.
-
Hello, I have same problems on it, i only want it transcribe different languages in same file, no need translate it . Did you findout how to do it already? except to cute the audio to small part then transcribe it then combine them again. too troble way. Thank you. |
Beta Was this translation helpful? Give feedback.
-
I don't think it translates to French - it probably mistakenly transcribes English as if it was French, so just comes up with nonsensical sentences. |
Beta Was this translation helpful? Give feedback.
-
I also faced with this issue and it seems that Whisper really transcribe and THEN translate the output. |
Beta Was this translation helpful? Give feedback.
-
Any news or ETA on this? I can confirm that the transcription endpoint (using OpenAI's Speech to text API) often translates instead of merely transcribing. It happens maybe 20 to 50% of the time in my experience, so it really shouldn't be hard to reproduce the issue. But if needed I can provide sample audio. It's annoying because it makes the behavior of the API seem random. I only want transcriptions, not translations (if I wanted translations I'd use the translation endpoint) 😕. |
Beta Was this translation helpful? Give feedback.
-
You can cut your audio into one-minute slices, and then transcribe that one-minute slice, and I tried this method and got the results I wanted
|
Beta Was this translation helpful? Give feedback.
-
I had the same challenge to face: a multilingual audio file with speakers using different languages. I tried the suggestion about the prompt but I found it not realiable and not always working. Instead, what I did is a segmentation of the input audio file by speaker. In my case the assumption was that each speaker use one single language, so, by segmenting the input audio file by speaker, Whisper run Language detectoin for each of those segments, then you just concatenate them and that's it. To obtain speaker segmentation you can use several toolkits for Speaker Diarization, I dedice to go with PyAnnote: https://github.com/pyannote/pyannote-audio So basically you first run Speaker diarization and then transcription of each speaker segment with Whisper. |
Beta Was this translation helpful? Give feedback.
-
Hi, I am still facing this issue, where multi lingual audio file, its not working, please share the any workarounds, my code: device = "cuda:0" if torch.cuda.is_available() else "cpu" model_id = "openai/whisper-large-v3" model = AutoModelForSpeechSeq2Seq.from_pretrained( processor = AutoProcessor.from_pretrained(model_id) pipe = pipeline( |
Beta Was this translation helpful? Give feedback.
-
import torch device = "cuda:0" if torch.cuda.is_available() else "cpu" model_id = "openai/whisper-large-v3" model = AutoModelForSpeechSeq2Seq.from_pretrained( processor = AutoProcessor.from_pretrained(model_id) pipe = pipeline( |
Beta Was this translation helpful? Give feedback.
-
Same problem here. Can't solve this using speaker diarization, as single speaker can speak multiple languages. Any ideas on how to solve this? |
Beta Was this translation helpful? Give feedback.
-
Hello,
I have a small issue with the transcription:
I have an audio where 2 people are speaking 2 different languages - French and English - and I would like to get the raw transcription of this audio in the spoken language.
However when I run the model, it automatically translates the whole audio in french (french being the detected language as the audio starts in french).
Is there a way to remove this "translation" feature ?
Thx
Beta Was this translation helpful? Give feedback.
All reactions