-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add file class based inference API for diarization #5945
Conversation
I resolved the conflict in speaker_utils.py and tested with the script you provided. @SeanNaren I think it is required to add a simple CI-test for this too? I guess it won't be that difficult. |
Thanks guys. I have made modifications as per requested to the code.
Yep, once we're happy with the API and stuff, can start to add tests for functionality. The current code seems to be a one run setup; the code isn't setup for I'll be modifying this behaviour. |
High level updates:
|
I have sent you the branch
https://github.com/NVIDIA/NeMo/blob/feat/inference_diar_add/nemo/collections/asr/models/configs/msdd_config.py Few points that I strongly insist to change:
Please ping me if you need to clarify something. |
After offline discussion the high level changes are as below:
When changes to the defaults are made, they are made in the data-class config file (the python data-class). When changes regarding meeting or general are made, they are made in the yaml files. This means parameters are not duplicated, but there is two different places to update (which I think is fine as maintainers, as the config file is mainly to try different configurations out). One note to make, is that our neural diarizer model is telephonic. If we release further adapted models to different domains (which is not certain yet) we will need to come up with a mechanism via the API to enable this. |
I recognize that making data class is necessary but editing diarization yaml file this way is too confusing. Even I am having a hard time grasping what is happening with the configuration. Also, setting telephonic as a default configuration for the whole diarization pipeline is also confusing. Can we just duplicate the configuration (for now, it is telephonic) and just revert all yaml files? This is too much of change on the original diarization pipeline, which is only focusing on API purpose. |
Thanks @tango4j I have updated the PR based on your suggestions. Would we be able to review and merge this now? |
821dd11
to
995f5ea
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple of points to be addressed.
Also it would be great if we can remove the scale-specific tqdm in clustering diarizer.
It is overkill to show it.
Thanks @tango4j everything addressed. I've reduced the TQDM logging by introducing verbosity to those functions. Everything left is just general NeMo logging, which also needs to definitely be reduced (but in a different PR I believe). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will do PRs when I need to update the configs later.
Thank you for your effort on building this.
Signed-off-by: SeanNaren <[email protected]>
7258142
to
a4d1f53
Compare
`NeuralDiarizer` | ||
""" | ||
cfg = NeuralDiarizerInferenceConfig.init_config( | ||
diar_model_path=model_name, vad_model_path=vad_model_name, map_location=map_location, verbose=verbose, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very cool
Signed-off-by: SeanNaren <[email protected]>
examples/speaker_tasks/diarization/conf/inference/diar_infer_general.yaml
Outdated
Show resolved
Hide resolved
examples/speaker_tasks/diarization/conf/inference/diar_infer_meeting.yaml
Outdated
Show resolved
Hide resolved
Signed-off-by: SeanNaren <[email protected]>
…load/set the weights/device Signed-off-by: SeanNaren <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work Sean.
Signed-off-by: hsiehjackson <[email protected]>
What does this PR do?
Supports:
User can change the device:
Supports passing the maximum number of speakers, or the actual number of speakers in a file (oracle):
Relies on a temporary directory, which can be specified if preferred:
List of available models:
TODO
NeuralDiarizer
will be a NeMo PT model file too, save for a later refactorCollection: ASR, Speaker Tasks
Changelog
Before your PR is "Ready for review"
Pre checks:
PR Type: