Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a function that writes cluster label in diarization pipeline #3643

Merged
merged 10 commits into from
Feb 14, 2022

Conversation

tango4j
Copy link
Collaborator

@tango4j tango4j commented Feb 10, 2022

What does this PR do ?

This PR adds a function that writes cluster label. This cluster label (speaker label for each segment) is needed to provide estimated segment label for overlap-aware diarization module or target-speaker ASR.

Collection: asr (but actually speaker_tasks)

Changelog

  • Removed unnecessary line in speaker_utils.py
  • Added a line that adds segment info and cluster label
  • Added lines that write files to speaker_outputs.

Usage

Please refer to Diarization docs

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

  • New Feature
  • Bugfix
  • Documentation

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Signed-off-by: Taejin Park <[email protected]>
Signed-off-by: Taejin Park <[email protected]>
Signed-off-by: Taejin Park <[email protected]>
@tango4j tango4j closed this Feb 10, 2022
@tango4j tango4j reopened this Feb 10, 2022
nithinraok
nithinraok previously approved these changes Feb 10, 2022
Copy link
Collaborator

@nithinraok nithinraok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks LGTM!

@tango4j tango4j merged commit 6a517f0 into main Feb 14, 2022
fayejf pushed a commit that referenced this pull request Mar 2, 2022
* Added cluster label function

Signed-off-by: Taejin Park <[email protected]>

* Fixed style issue

Signed-off-by: Taejin Park <[email protected]>

* Changed append -> write when open()

Signed-off-by: Taejin Park <[email protected]>

* Style fix

Signed-off-by: Taejin Park <[email protected]>

* Update README.md for cluster label file

Signed-off-by: Taejin Park <[email protected]>

Co-authored-by: Nithin Rao <[email protected]>
@tango4j tango4j deleted the clustering_emb_save_json branch April 14, 2022 23:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants