Add the SEW and SEW-D speech models #13962

anton-l · 2021-10-11T13:41:24Z

What does this PR do?

This PR adds the SEW and SEW-D model from the paper "Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition"

Source of the models: https://github.com/asappresearch/sew/

SEW is based on Wav2Vec2, but with time frame downsampling and upsampling around the transformer layers.
SEW-D replaces the transformer layers in SEW with a DeBERTa-v2 encoder.

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in this PR.

TODO

model docs
checkpoints conversion
fintetuned CTC checkpoints?

src/transformers/models/sew/modeling_sew.py

src/transformers/models/sew_d/modeling_sew_d.py

anton-l · 2021-10-11T13:51:18Z

The VQ pretraining modules aren't ported yet. After #13877 is merged they'll be added in a separate PR.

patrickvonplaten · 2021-10-12T19:55:38Z

Let's try to get this PR merged by Thursday/Friday - anything I can help with? :-)

patrickvonplaten

Nice!

An important next step would be to add all the checkpoints that are public to the hub - note that we can also do integration tests for pretrained only checkpoints

patrickvonplaten · 2021-10-14T18:56:39Z

src/transformers/models/hubert/modeling_hubert.py

    def __init__(self, config, layer_id=0):
        super().__init__()
-        self.in_conv_dim = config.conv_dim[layer_id] if layer_id > 0 else 1
+        self.in_conv_dim = config.conv_dim[layer_id - 1] if layer_id > 0 else 1


Thanks! cc @mfuntowicz

patrickvonplaten · 2021-10-14T18:57:24Z

src/transformers/models/sew/__init__.py

+
+
+_import_structure = {
+    ".wav2vec2.feature_extraction_wav2vec2": ["Wav2Vec2FeatureExtractor"],


do we need that import here?

Probably not right now, this was used to enable AutoFeatureExtractor for audio classification pipelines with Hubert #13366

Ah I haven't done a good review of #13366 - we shouldn't have done this - sorry! Have we already done a release since merging this PR?

The AutoFeatureExtractor works out of the box by adding this line to the config:

transformers/tests/fixtures/preprocessor_config.json

Line 2 in d5b82bb

"feature_extractor_type": "Wav2Vec2FeatureExtractor"

We just need to add this to the configs & I would be in favor of also deprecating the HuBERT key in the AutoFeatureExtractor and instead update all hubert configs.

Also cc @sgugger what do you think?

The vision and speech APIs are both still experimental, so I'm fine with this small breaking change.

@anton-l - we can change HuBERT in a follow-up PR with Deprecation, let's try to not continue the design

src/transformers/models/sew/__init__.py

src/transformers/models/sew/configuration_sew.py

src/transformers/models/sew/modeling_sew.py

patrickvonplaten · 2021-10-14T20:21:58Z

src/transformers/models/sew_d/modeling_sew_d.py

+        )
+
+
+class SEWDPreTrainedModel(PreTrainedModel):


gradient checkpointing here?

Ah ok it relies on DeBERTa - eventually we should also add gradient checkpointing (to DeBERTa to have it here)

But let's do it in another PR :-)

tests/test_modeling_sew.py

tests/test_modeling_sew_d.py

src/transformers/models/auto/configuration_auto.py

Co-authored-by: Patrick von Platen <[email protected]>

…o add-sew

anton-l · 2021-10-15T13:11:27Z

@patrickvonplaten in the end I removed the feature_projection if-else and left the modules only in SEW-D.
The checkpoints are all uploaded now 🎉
https://huggingface.co/models?other=sew
https://huggingface.co/models?other=sew-d

src/transformers/models/auto/feature_extraction_auto.py

patrickvonplaten

Great looks good to me! Just two things:

Move the checkpoints to the official org
Remove sew from the AutoFeatureExtractor

Co-authored-by: Patrick von Platen <[email protected]>

anton-l added 3 commits October 6, 2021 18:43

Working encoder

01afce8

SEW-D and tests

a6ec41c

Further conv fixes

a31de65

anton-l commented Oct 11, 2021

View reviewed changes

src/transformers/models/sew/modeling_sew.py Outdated Show resolved Hide resolved

anton-l commented Oct 11, 2021

View reviewed changes

src/transformers/models/sew_d/modeling_sew_d.py Outdated Show resolved Hide resolved

anton-l marked this pull request as draft October 11, 2021 14:06

Automodels and conv inits

215d088

anton-l added 6 commits October 12, 2021 23:57

Merge remote-tracking branch 'upstream/master' into add-sew

4683399

Update integration tests, add docs

99e4333

Docs cleanup, resolve todos

3885417

Conf fix

e5ecda3

Merge remote-tracking branch 'upstream/master' into add-sew

1acbd60

Fix docs

23e4fd2

anton-l marked this pull request as ready for review October 14, 2021 14:53

anton-l requested a review from patrickvonplaten October 14, 2021 15:19

patrickvonplaten reviewed Oct 14, 2021

View reviewed changes

anton-l and others added 6 commits October 15, 2021 13:06

Fix tests, apply suggestions

57ec1f2

Update src/transformers/models/sew/modeling_sew.py

8a69931

Co-authored-by: Patrick von Platen <[email protected]>

Merge branch 'add-sew' of https://github.com/anton-l/transformers int…

e1c952c

…o add-sew

Model conversion and updated no-mask tests

08307e3

Remove copy of feature_proj

b955780

Style

600eb85

anton-l requested a review from LysandreJik October 15, 2021 13:11

patrickvonplaten reviewed Oct 15, 2021

View reviewed changes

src/transformers/models/auto/feature_extraction_auto.py Outdated Show resolved Hide resolved

patrickvonplaten reviewed Oct 15, 2021

View reviewed changes

src/transformers/models/auto/feature_extraction_auto.py Outdated Show resolved Hide resolved

patrickvonplaten approved these changes Oct 15, 2021

View reviewed changes

Update src/transformers/models/auto/feature_extraction_auto.py

4b5d78d

Co-authored-by: Patrick von Platen <[email protected]>

anton-l and others added 2 commits October 15, 2021 17:26

Update src/transformers/models/auto/feature_extraction_auto.py

401a940

Co-authored-by: Patrick von Platen <[email protected]>

Move orgs

88c39be

anton-l merged commit cd3166a into huggingface:master Oct 15, 2021



		_import_structure = {
		".wav2vec2.feature_extraction_wav2vec2": ["Wav2Vec2FeatureExtractor"],

		)


		class SEWDPreTrainedModel(PreTrainedModel):

Add the SEW and SEW-D speech models #13962

Add the SEW and SEW-D speech models #13962

Uh oh!

Conversation

anton-l commented Oct 11, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Who can review?

TODO

Uh oh!

Uh oh!

Uh oh!

anton-l commented Oct 11, 2021

Uh oh!

patrickvonplaten commented Oct 12, 2021

Uh oh!

patrickvonplaten left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

anton-l commented Oct 15, 2021

Uh oh!

Uh oh!

Uh oh!

patrickvonplaten left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

anton-l commented Oct 11, 2021 •

edited

Loading