caching deterministic portions of a transcription run #72

filmo · 2023-08-05T23:41:20Z

filmo
Aug 5, 2023

Question regarding demucs and NeMo NeuralDiarizer: Are they deterministic?

In other words, given a particular audio file, will the results of demucs.prepare and NeuralDiarizer() be consistent assuming the configuration stays consistent across runs??

I couldn't tell from reading NVIDIA's documentation.

If so, it seems like caching it's results might be useful. I'm already finding that I often need to run the Whisper process multiple time with different settings to get good results and bypassing having to run the MSDD again would speed up performance.

It also seems like the mono_file.wav can be cached, but that's only a minor performance gain

Inside of create_config() for the NeMo set up, I also noticed that it is using WGET to retrieve the yaml files from the NVIDIA repository. Perhaps these change occasionally?? but also seems like those three yaml files could be cached locally instead.

MahmoudAshraf97 · 2023-08-05T23:57:10Z

MahmoudAshraf97
Aug 5, 2023
Maintainer

I guess based on how diarization works in general, yes it's deterministic, but I didn't dive into the details of the architecture tbh, the diarization process consists of three models: MarbleNet VAD, TitaNet speaker embeddings, MSDD for diarization, I think you can get a solid answer for your question by skimming the papers of these models and I guess it would be safe to assume that the process is deterministic given a constant configuration. I can pull the configuration from a specific commit to proof it against future changes, but since this project isn't largely used in research, we can make use of the most updated configuration file, and it's rarely updated btw.

As for the rest of the question, you can cache the rttm file generated by nemo and skip the whole process to save time.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

caching deterministic portions of a transcription run #72

{{title}}

Replies: 1 comment

{{title}}

Select a reply

caching deterministic portions of a transcription run #72

filmo Aug 5, 2023

Replies: 1 comment

MahmoudAshraf97 Aug 5, 2023 Maintainer

filmo
Aug 5, 2023

MahmoudAshraf97
Aug 5, 2023
Maintainer