-
Notifications
You must be signed in to change notification settings - Fork 31.9k
Add the SEW and SEW-D speech models #13962
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
The VQ pretraining modules aren't ported yet. After #13877 is merged they'll be added in a separate PR. |
|
Let's try to get this PR merged by Thursday/Friday - anything I can help with? :-) |
patrickvonplaten
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
An important next step would be to add all the checkpoints that are public to the hub - note that we can also do integration tests for pretrained only checkpoints
| def __init__(self, config, layer_id=0): | ||
| super().__init__() | ||
| self.in_conv_dim = config.conv_dim[layer_id] if layer_id > 0 else 1 | ||
| self.in_conv_dim = config.conv_dim[layer_id - 1] if layer_id > 0 else 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! cc @mfuntowicz
|
|
||
|
|
||
| _import_structure = { | ||
| ".wav2vec2.feature_extraction_wav2vec2": ["Wav2Vec2FeatureExtractor"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need that import here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably not right now, this was used to enable AutoFeatureExtractor for audio classification pipelines with Hubert #13366
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I haven't done a good review of #13366 - we shouldn't have done this - sorry! Have we already done a release since merging this PR?
The AutoFeatureExtractor works out of the box by adding this line to the config:
| "feature_extractor_type": "Wav2Vec2FeatureExtractor" |
We just need to add this to the configs & I would be in favor of also deprecating the HuBERT key in the AutoFeatureExtractor and instead update all hubert configs.
Also cc @sgugger what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The vision and speech APIs are both still experimental, so I'm fine with this small breaking change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@anton-l - we can change HuBERT in a follow-up PR with Deprecation, let's try to not continue the design
| ) | ||
|
|
||
|
|
||
| class SEWDPreTrainedModel(PreTrainedModel): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gradient checkpointing here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah ok it relies on DeBERTa - eventually we should also add gradient checkpointing (to DeBERTa to have it here)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But let's do it in another PR :-)
Co-authored-by: Patrick von Platen <[email protected]>
|
@patrickvonplaten in the end I removed the |
patrickvonplaten
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great looks good to me! Just two things:
- Move the checkpoints to the official org
- Remove sew from the AutoFeatureExtractor
Co-authored-by: Patrick von Platen <[email protected]>
Co-authored-by: Patrick von Platen <[email protected]>
What does this PR do?
This PR adds the SEW and SEW-D model from the paper "Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition"
Source of the models: https://github.com/asappresearch/sew/
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in this PR.
TODO