Skip to content

Conversation

@anton-l
Copy link
Member

@anton-l anton-l commented Oct 11, 2021

What does this PR do?

This PR adds the SEW and SEW-D model from the paper "Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition"

Source of the models: https://github.com/asappresearch/sew/

  • SEW is based on Wav2Vec2, but with time frame downsampling and upsampling around the transformer layers.
  • SEW-D replaces the transformer layers in SEW with a DeBERTa-v2 encoder.

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in this PR.

TODO

  • model docs
  • checkpoints conversion
  • fintetuned CTC checkpoints?

@anton-l
Copy link
Member Author

anton-l commented Oct 11, 2021

The VQ pretraining modules aren't ported yet. After #13877 is merged they'll be added in a separate PR.

@anton-l anton-l marked this pull request as draft October 11, 2021 14:06
@patrickvonplaten
Copy link
Contributor

Let's try to get this PR merged by Thursday/Friday - anything I can help with? :-)

@anton-l anton-l marked this pull request as ready for review October 14, 2021 14:53
Copy link
Contributor

@patrickvonplaten patrickvonplaten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

An important next step would be to add all the checkpoints that are public to the hub - note that we can also do integration tests for pretrained only checkpoints

def __init__(self, config, layer_id=0):
super().__init__()
self.in_conv_dim = config.conv_dim[layer_id] if layer_id > 0 else 1
self.in_conv_dim = config.conv_dim[layer_id - 1] if layer_id > 0 else 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! cc @mfuntowicz



_import_structure = {
".wav2vec2.feature_extraction_wav2vec2": ["Wav2Vec2FeatureExtractor"],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need that import here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not right now, this was used to enable AutoFeatureExtractor for audio classification pipelines with Hubert #13366

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I haven't done a good review of #13366 - we shouldn't have done this - sorry! Have we already done a release since merging this PR?

The AutoFeatureExtractor works out of the box by adding this line to the config:

"feature_extractor_type": "Wav2Vec2FeatureExtractor"

We just need to add this to the configs & I would be in favor of also deprecating the HuBERT key in the AutoFeatureExtractor and instead update all hubert configs.

Also cc @sgugger what do you think?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The vision and speech APIs are both still experimental, so I'm fine with this small breaking change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@anton-l - we can change HuBERT in a follow-up PR with Deprecation, let's try to not continue the design

)


class SEWDPreTrainedModel(PreTrainedModel):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gradient checkpointing here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok it relies on DeBERTa - eventually we should also add gradient checkpointing (to DeBERTa to have it here)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But let's do it in another PR :-)

@anton-l
Copy link
Member Author

anton-l commented Oct 15, 2021

@patrickvonplaten in the end I removed the feature_projection if-else and left the modules only in SEW-D.
The checkpoints are all uploaded now 🎉
https://huggingface.co/models?other=sew
https://huggingface.co/models?other=sew-d

@anton-l anton-l requested a review from LysandreJik October 15, 2021 13:11
Copy link
Contributor

@patrickvonplaten patrickvonplaten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great looks good to me! Just two things:

  • Move the checkpoints to the official org
  • Remove sew from the AutoFeatureExtractor

@anton-l anton-l merged commit cd3166a into huggingface:master Oct 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants