Replies: 1 comment
-
I guess based on how diarization works in general, yes it's deterministic, but I didn't dive into the details of the architecture tbh, the diarization process consists of three models: MarbleNet VAD, TitaNet speaker embeddings, MSDD for diarization, I think you can get a solid answer for your question by skimming the papers of these models and I guess it would be safe to assume that the process is deterministic given a constant configuration. I can pull the configuration from a specific commit to proof it against future changes, but since this project isn't largely used in research, we can make use of the most updated configuration file, and it's rarely updated btw. As for the rest of the question, you can cache the rttm file generated by nemo and skip the whole process to save time. |
Beta Was this translation helpful? Give feedback.
-
Question regarding demucs and NeMo NeuralDiarizer: Are they deterministic?
In other words, given a particular audio file, will the results of demucs.prepare and NeuralDiarizer() be consistent assuming the configuration stays consistent across runs??
I couldn't tell from reading NVIDIA's documentation.
If so, it seems like caching it's results might be useful. I'm already finding that I often need to run the Whisper process multiple time with different settings to get good results and bypassing having to run the MSDD again would speed up performance.
It also seems like the mono_file.wav can be cached, but that's only a minor performance gain
Inside of create_config() for the NeMo set up, I also noticed that it is using WGET to retrieve the yaml files from the NVIDIA repository. Perhaps these change occasionally?? but also seems like those three yaml files could be cached locally instead.
Beta Was this translation helpful? Give feedback.
All reactions