-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MSDD Model __call__
method throws error
#7975
Comments
Hi, @benhoff. This call function is designed for hugging face API, to minimize the preparation steps. We will add the error handling for this case of illegal call. |
Hey @tango4j ! Been following your recently as I chunk through some of this stuff, looking forward to seeing the online diarizer PR land as it's closer to my use case. I'm looking to diarize short chunks of audio (<30 seconds) on a server. I've wrapped this code base with an API and I'm rewriting/updating the mainfest.json repeatably to use the Is that the recommended way to do that? Or would it be better to use the |
@benhoff Online diarization system requires more sophisticated system where we need to implement history buffer mechanism to mermorize the past speaker profiles. We won't be designing the online diarization to have such a long (30se) buffer, we will make it have 1~2 second of frame inputs. However, you will be able to tweak the system to serve as you intend. In Part-2 PR, there will be tutorial, example and yaml file. For the time being, I think doing offline diarization of all the cumulated audio and perform offline diarization is only way. you might want to match the speakers among multiple sequential outputs. |
@tango4j , no worries on the PR, though good to know that the use case is targeting 1-2 seconds, I won't wait for it to land and take the offline approach you suggested instead. I was thinking about implementing the historical buffer myself, but I was surprised that most of the approaches don't prune some of the more spurious data or smaller data (for example, everything less than 0.5 seconds) automatically. Maybe the algorithmic clustering approaches do this via math, but in a meeting use case, my gut instinct is to throw away some of the noise and only keep longer, more robust phrases so that you can get a clean fingerprint of someone's voice. |
This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days. |
Describe the bug
If you use the
__call__
method from NeuralDiarizer instead of thediarize
method, you get the below stack trace.Steps/Code to reproduce bug
If you go to the
Speaker_Diarization_Inference.ipynb
file, and run everything, you can change the code tosystem_vad_msdd_model('data/an4_diarize_test.wav')
, and the error will get thrown.The text was updated successfully, but these errors were encountered: