Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REQ] WhisperX support #38

Open
ch826 opened this issue May 28, 2023 · 1 comment
Open

[REQ] WhisperX support #38

ch826 opened this issue May 28, 2023 · 1 comment
Labels
enhancement New feature or request

Comments

@ch826
Copy link

ch826 commented May 28, 2023

First, I want to thank the author of this tool for simplifying the process of using OpenAI Whisper. Thanks to you, Fauzan, far more people are able to use the features of Whisper via a clean GUI.

As a feature request, I would love to see support added in your program for the latest enhancements added by WhisperX (https://github.com/m-bain/whisperX), which is a greatly-improved version of OpenAI's Whisper.

WhisperX is by a research group from the University of Oxford, is 70x faster than OpenAI Whisper, requires much less GPU memory running the language models, has a lower word error rate, does not have the hallucinations, drifting and repetitions that standard WhisperAI is prone to. The program detects when there is silence, can also detect when there are multiple speakers and identify each one uniquely, even with overlapping voices. It is also able to produce far more accurate timestamps, down to the level of individual letters in the words.

As it processes a recording, it splits the audio into 30 second chunks then batch processes them simultaneously for a dramatic speed increase. It appears to be different from WhisperJAX (https://github.com/sanchit-gandhi/whisper-jax) in that the released version of WhisperJAX splits the audio for batch processing without proper context, meaning that the cuts sometimes occur in the middle of words, which means that WhisperJAX ends up translating partial words, which generates a higher Word Error Rate. WhisperX does not do this. It scans before splitting, properly detecting the start and stop of words, so cuts happen in the spaces between.

I have been reading that WhisperX does a much better job translating various languages compared to OpenAI's version, which makes me think that proceeding with the current version of Whisper I have been using is fairly pointless, because the results would be inferior to WhisperX and I would need to re-do them later.

The problem is that I have been unable to get WhisperX running properly on my machine. I don't know which version/update of which dependency has broken the installation. I have reinstalled things multiple times and spent many hours trying to troubleshoot it. I know that there are many others experiencing similar problems like me. It would be great if you could provide support for either WhisperX or even Faster-Whisper (https://github.com/guillaumekln/faster-whisper) which is not as advanced as WhisperX, but is an improvement over regular WhisperAI.

Ideally, users would have the option to choose between OpenAI's standard version and a huge improvement like WhisperX. Combining the improvements of WhisperX with your GUI would be wonderful!

More info:
https://github.com/m-bain/whisperX (WhisperX GitHub source)
https://www.slashcam.com/news/single/WhisperX--Free-audio-transcription-with-speaker-re-17704.html
https://web.archive.org/web/20230301023005/https://www.swyx.io/transcribe-podcasts-with-whisper
https://arxiv.org/abs/2303.00747

@ch826 ch826 added the enhancement New feature or request label May 28, 2023
@Dadangdut33
Copy link
Owner

Dadangdut33 commented Aug 3, 2023

That is long and very detailed, thank you. And yeah i'll try to add whisperx later on because I think i would like to make it so that the user can choose which backend to be used as an option

I have tried adding it before but it seems that there would need to be lots of refactoring so i decided to use whisper_timestamped stable-ts for now, although the development has been kinda slow lately because of so much personal stuff that i'm doing right now aside from developing this app.

thanks for the suggestion!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants