-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to use speaker diarization without writing input to manifest and output to .rttm file ? #6271
Comments
You can perform speaker diarization without such manifest/rttm with the following feature: Try following the example in the above PR. |
Is this feature has not been integrated in nemo library via pip ? I installed newest version (version 1.16 at this point) and I realized that I can not import NeuralDiarizer using command "from nemo.collections.asr.models import NeuralDiarizer", but main branch on github can do that. |
I have read code in NeuralDiarizer class. I realized that the audio files still must be written into manifest file. When I run multi-processes, the processes will overwrite this file. This will lead to conflig problem. I think I need to customize code due to my need using monkey patching :( |
|
Thanks for your great repository. I have a question about speaker diarization.
I do the inference steps according to this tutorial: https://github.com/NVIDIA/NeMo/blob/main/tutorials/speaker_tasks/Speaker_Diarization_Inference.ipynb
According to this tutorial, audio will be written to manifest file, output will be saved in .rttm format in out_dir ( manifest file and out_dir are defined in diar_infer_telephonic.yaml . Is there a direct way to pass the input as audio metadata (path, duration, ...) and give the results without writing the manifest and the .rttm files?
The text was updated successfully, but these errors were encountered: