Skip to content

Conversation

@patrickvonplaten
Copy link
Contributor

@patrickvonplaten patrickvonplaten commented Oct 11, 2021

What does this PR do?

This PR adds UniSpeech from Microsoft: https://github.com/microsoft/UniSpeech

TODOS:

Future PR:

  • Correct pretraining loss

@patrickvonplaten
Copy link
Contributor Author

Wait until #13877 is merged

@patrickvonplaten
Copy link
Contributor Author

PR is good for review IMO:

@patrickvonplaten
Copy link
Contributor Author

I think we can merge the pretrained models now. To make them "promotable" we should still do 2 things:

  • Unispeech: Add phoneme <-> text tokenizer, need some feedback here from the authors
  • Unispeech-SAT: the model should work very well for speaker-verification and speaker-diarization. We should add those two tasks and then promote the model on it as it performs very well

Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for adding those two models!

Copy link
Member

@anton-l anton-l left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks a lot for debugging the original models!


# quantize all (unmasked) extracted features and project to final vq dim
extract_features = self.dropout_features(outputs[1])
quantized_features, codevector_perplexity = self.quantizer(extract_features)
Copy link
Member

@anton-l anton-l Oct 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't UniSpeech use the same masking strategy for quantization as Wav2Vec? Or did you remove masking just for debugging purposes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pretraining is quite different and not implemented yet really - this code should not be used yet

@patrickvonplaten patrickvonplaten changed the title Add Unispeech Add Unispeech & Unispeech-SAT Oct 26, 2021
@patrickvonplaten patrickvonplaten merged commit 9f3aa46 into huggingface:master Oct 26, 2021
@patrickvonplaten patrickvonplaten deleted the add_unispeech branch October 26, 2021 17:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants