Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Silences are not detected, Whisper thinks speech is non-stop #2321

Open
manumaan opened this issue Jul 25, 2024 · 2 comments
Open

Silences are not detected, Whisper thinks speech is non-stop #2321

manumaan opened this issue Jul 25, 2024 · 2 comments

Comments

@manumaan
Copy link

manumaan commented Jul 25, 2024

I want to start by saying big thanks to making it possible to make subtitles in very short time on mac m1 using GPU.

But there is one problem that bugs me.
Say the climax of a mystery movie, where the detective is going to declare the murderer.

Actual script is like:

1:30:00 - 1.30.15 "And so the murderer is.."
Silence of 10 second while the camera sweeps over the suspects
1:30:25 - 1:30:28 "The Butler!"

But when transcribed by Whisper, the silence is not taken into account. The second dialogue comes just as the 1st one ends.

1:30:00 - 1.30.15 "And so the murderer is.."
1:30:16 - 1:30:28 "The Butler!" (Camera is sweeping over suspects for 10 seconds of this time, with no speech in movie)

The words are very accurately (most of the time) detected, but the whisper does not like silences. It starts next dialogues just as soon as the last dialogue ends. This requires some manual effort to fix on the subtitles. Is there any way to fix this?

@thewh1teagle
Copy link
Contributor

This PR should fix it #2279

@manumaan
Copy link
Author

That would be great!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants