Add DeBERTa model #5929

BigBird01 · 2020-07-21T04:49:55Z

Add DeBERTa model to hf transformers. DeBERTa applies two techniques to improve RoBERTa, one is disentangled attention, the other is enhanced mask decoder. With 80GB training data, DeBERTa outperform RoBERTa on a majority of NLU tasks, e.g. SQUAD, MNLI and RACE. Paper link: https://arxiv.org/abs/2006.03654

codecov · 2020-07-21T07:30:10Z

Codecov Report

Merging #5929 into master will decrease coverage by 0.30%.
The diff coverage is 73.13%.

@@            Coverage Diff             @@
##           master    #5929      +/-   ##
==========================================
- Coverage   79.35%   79.05%   -0.31%     
==========================================
  Files         181      184       +3     
  Lines       35800    36660     +860     
==========================================
+ Hits        28410    28982     +572     
- Misses       7390     7678     +288

Impacted Files	Coverage Δ
src/transformers/activations.py	`76.92% <50.00%> (-6.42%)`	⬇️
src/transformers/tokenization_deberta.py	`69.76% <69.76%> (ø)`
src/transformers/modeling_deberta.py	`73.26% <73.26%> (ø)`
src/transformers/__init__.py	`99.39% <100.00%> (+0.01%)`	⬆️
src/transformers/configuration_auto.py	`96.34% <100.00%> (+0.04%)`	⬆️
src/transformers/configuration_deberta.py	`100.00% <100.00%> (ø)`
src/transformers/modeling_auto.py	`87.11% <100.00%> (+0.06%)`	⬆️
src/transformers/tokenization_auto.py	`92.64% <100.00%> (+0.10%)`	⬆️
src/transformers/modeling_tf_lxmert.py	`22.14% <0.00%> (-72.41%)`	⬇️
src/transformers/tokenization_pegasus.py	`46.03% <0.00%> (-49.21%)`	⬇️
... and 17 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1fc4de6...0a08565. Read the comment docs.

BigBird01 · 2020-07-21T17:16:26Z

Related issue #4858

sshleifer

Thanks for the contribution!
I left some comments. Most importantly we can't add a deeberta dependency.
Let me know if there's anything I can do to help :)

setup.py

tests/test_modeling_deberta.py

mrm8488 · 2020-07-22T08:28:39Z

Someone is waiting for fine-tuning a new model :)

LysandreJik · 2020-07-30T08:16:03Z

Very cool, looking forward to that model!!

BigBird01 · 2020-08-07T00:23:52Z

Hello, May I know when will the PR be merged?

src/transformers/modeling_deberta.py

sshleifer · 2020-08-07T02:11:44Z

@BigBird01 it will probably take between 1 and 3 weeks to merge. 2,500 lines is a lot to review :)
I made some comments, and can take another pass on this when they're addressed.

src/transformers/configuration_deberta.py

src/transformers/modeling_deberta.py

BigBird01 · 2020-09-02T21:50:55Z

@BigBird01 it will probably take between 1 and 3 weeks to merge. 2,500 lines is a lot to review :)
I made some comments, and can take another pass on this when they're addressed.

Thanks!

LysandreJik · 2020-09-03T09:08:52Z

Thanks for addressing the comments! Will take a look in a few days.

LysandreJik

Great, it's nearly done! Thanks a lot for your work on it.

What's left to do is:

Ensure that the documentation is in the correct format
Enable the remaining tests

If you don't have time to work on it right now, let me know and I'll finish the implementation and merge it. Thanks!

LysandreJik · 2020-09-08T13:12:14Z

README.md

Suggested change

22. **[DeBERTa](https://huggingface.co/transformers/model_doc/deberta.html)** (from Microsoft Research) released with the paper [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://arxiv.org/abs/2006.03654) by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen.

25. **[Other community models](https://huggingface.co/models)**, contributed by the [community](https://huggingface.co/users).

26. Want to contribute a new model? We have added a **detailed guide and templates** to guide you in the process of adding a new model. You can find them in the [`templates`](./templates) folder of the repository. Be sure to check the [contributing guidelines](./CONTRIBUTING.md) and contact the maintainers or open an issue to collect feedbacks before starting your PR.

25. **[DeBERTa](https://huggingface.co/transformers/model_doc/deberta.html)** (from Microsoft Research) released with the paper [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://arxiv.org/abs/2006.03654) by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen.

26. **[Other community models](https://huggingface.co/models)**, contributed by the [community](https://huggingface.co/users).

27. Want to contribute a new model? We have added a **detailed guide and templates** to guide you in the process of adding a new model. You can find them in the [`templates`](./templates) folder of the repository. Be sure to check the [contributing guidelines](./CONTRIBUTING.md) and contact the maintainers or open an issue to collect feedbacks before starting your PR.

LysandreJik · 2020-09-08T13:14:15Z

src/transformers/configuration_deberta.py

Is this necessary?

Still don't understand if this is absolutely necessary or not

LysandreJik · 2020-09-08T13:15:31Z

src/transformers/configuration_deberta.py

Would be great if this could be in the same style as the library's docstrings. See BERT's docstrings for example:

transformers/src/transformers/configuration_bert.py

Lines 63 to 92 in 48ff6d5

Args:

vocab_size (:obj:`int`, optional, defaults to 30522):

Vocabulary size of the BERT model. Defines the different tokens that

can be represented by the `inputs_ids` passed to the forward method of :class:`~transformers.BertModel`.

hidden_size (:obj:`int`, optional, defaults to 768):

Dimensionality of the encoder layers and the pooler layer.

num_hidden_layers (:obj:`int`, optional, defaults to 12):

Number of hidden layers in the Transformer encoder.

num_attention_heads (:obj:`int`, optional, defaults to 12):

Number of attention heads for each attention layer in the Transformer encoder.

intermediate_size (:obj:`int`, optional, defaults to 3072):

Dimensionality of the "intermediate" (i.e., feed-forward) layer in the Transformer encoder.

hidden_act (:obj:`str` or :obj:`function`, optional, defaults to "gelu"):

The non-linear activation function (function or string) in the encoder and pooler.

If string, "gelu", "relu", "swish" and "gelu_new" are supported.

hidden_dropout_prob (:obj:`float`, optional, defaults to 0.1):

The dropout probabilitiy for all fully connected layers in the embeddings, encoder, and pooler.

attention_probs_dropout_prob (:obj:`float`, optional, defaults to 0.1):

The dropout ratio for the attention probabilities.

max_position_embeddings (:obj:`int`, optional, defaults to 512):

The maximum sequence length that this model might ever be used with.

Typically set this to something large just in case (e.g., 512 or 1024 or 2048).

type_vocab_size (:obj:`int`, optional, defaults to 2):

The vocabulary size of the `token_type_ids` passed into :class:`~transformers.BertModel`.

initializer_range (:obj:`float`, optional, defaults to 0.02):

The standard deviation of the truncated_normal_initializer for initializing all weight matrices.

layer_norm_eps (:obj:`float`, optional, defaults to 1e-12):

The epsilon used by the layer normalization layers.

gradient_checkpointing (:obj:`bool`, optional, defaults to :obj:`False`):

If True, use gradient checkpointing to save memory at the expense of slower backward pass.

LysandreJik · 2020-09-08T13:17:49Z

src/transformers/modeling_deberta.py

We only support torch>=1.0.1 so there's no need for this assertion

Should remove this check

LysandreJik · 2020-09-08T13:18:29Z

src/transformers/modeling_deberta.py

Suggested change

There is an issue for tracing customer python torch Function, using this decorator to work around it.

There is an issue for tracing custom python torch Function, using this decorator to work around it.

LysandreJik · 2020-09-08T13:32:50Z

src/transformers/modeling_deberta.py

Same here regarding docs

LysandreJik · 2020-09-08T13:33:33Z

src/transformers/modeling_deberta.py

Need to handle the hidden states and attentions there

LysandreJik · 2020-09-08T13:35:33Z

src/transformers/tokenization_deberta.py

DeBERTa?

Suggested change

A BERT sequence has the following format:

A DeBERTa sequence has the following format:

Also please copy and adapt from the template/BERT tokenizer.

LysandreJik · 2020-09-08T13:35:48Z

src/transformers/tokenization_deberta.py

Suggested change

A BERT sequence pair mask has the following format:

A DeBERTa sequence pair mask has the following format:

Also please copy and adapt from the template/BERT tokenizer.

LysandreJik · 2020-09-08T13:37:18Z

tests/test_modeling_deberta.py

We'll need to enable these for the merge

BigBird01 · 2020-09-08T16:14:45Z

Great, it's nearly done! Thanks a lot for your work on it.

What's left to do is:

Ensure that the documentation is in the correct format

Enable the remaining tests

If you don't have time to work on it right now, let me know and I'll finish the implementation and merge it. Thanks!

@LysandreJik Thanks for the comments. It will be great if you can work on the rest:) Feel free to let me know if you have any questions on the implementation.

LysandreJik · 2020-09-09T14:48:43Z

Okay @BigBird01, I have the PyTorch version ready, passing all tests and the docs cleaned up as well. Should I push directly on your branch, or do you want me to open a PR on your fork so that you can check my changes before applying them?

patrickvonplaten · 2020-09-29T10:31:01Z

src/transformers/configuration_deberta.py

+            The value used to pad input_ids.
+        position_biased_input (:obj:`bool`, `optional`, defaults to :obj:`True`):
+            Whether add absolute position embedding to content embedding.
+        pos_att_type (:obj:`str`, `optional`, defaults to "None"):


Would prefer to directly use a list of strings here instead of p2c|c2p| .... it is a bit strange that we do string operations in the modeling file such as ''.split("|"). What do you think @LysandreJik

At the beginning we explored more options here while we finally comes to p2c|c2P. But it's OK to change it to list.

patrickvonplaten

Thanks a lot for your PR @BigBird01 ! I'm very exciting to have a new attention mechanism in the library and your results in the paper look awesome!

A couple of things I would like to change before merging:
Would be happier if the layer names could be renamed from Bert... -> DeBert... for better consistency. Also I think we should delete the MaskedLayerNorm as explained below.
Lastly, can we change the config.pos_att_type from a string that is converted to a list directly to a list?

sgugger

Thanks a lot for adding this model.
It looks like you have used an old version of the template for the docstrings though, so there are quite a few things to adapt/update. We can help with that if necessary.

More generally for the naming, I don't really like the capitals in the models name. For instance BERT is BertModel/BertConfig/BertTokenizer. RoBERTa is RobertaModel/RobertaConfig/RobertaTokenizer. I think we should use the same for DeBERTa and have DebertaModel/DebertaConfig/DebertaTokenizer.

docs/source/model_doc/deberta.rst

src/transformers/configuration_deberta.py

src/transformers/tokenization_deberta.py

sgugger · 2020-09-29T13:35:56Z

src/transformers/tokenization_deberta.py

Also please copy and adapt from the template/BERT tokenizer.

sgugger · 2020-09-29T13:36:03Z

src/transformers/tokenization_deberta.py

Also please copy and adapt from the template/BERT tokenizer.

LysandreJik · 2020-09-29T13:41:06Z

Thanks @patrickvonplaten, @sgugger for the reviews. Will implement the changes tonight.

src/transformers/configuration_deberta.py

src/transformers/modeling_deberta.py

src/transformers/configuration_deberta.py

src/transformers/modeling_deberta.py

src/transformers/tokenization_deberta.py

tests/test_modeling_deberta.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

LysandreJik · 2020-09-30T11:07:40Z

Thanks for your work on this @BigBird01 :)

BigBird01 · 2020-10-02T00:12:02Z

Thanks for your work on this @BigBird01 :)

Thank you all to merge the code into master @LysandreJik @patrickvonplaten
One question, why after the merge we can't find the document of deberta model at https://huggingface.co/transformers/
Could you help to check that?

patrickvonplaten · 2020-10-02T10:29:25Z

The documentation is online, you just have to click on master on the top left right under the hugging face logo as is done here: https://huggingface.co/transformers/master/. The next release will then show deberta docs as a default :-)

LysandreJik · 2020-10-05T10:02:31Z

@BigBird01, two slow tests are failing with the DeBERTa models. Could you show how you implemented the integration tests so that I may investigate?

BigBird01 · 2020-10-05T16:22:03Z

@BigBird01, two slow tests are failing with the DeBERTa models. Could you show how you implemented the integration tests so that I may investigate?

@LysandreJik In the integration tests, I just feed the model with a fake input data and verify the output of the model. It's similar to RoBERTa tests. I may take a took at it today.

LysandreJik · 2020-10-06T10:24:29Z

Thanks! The DeBERTa may not be working correctly right now, knowing the source of the issue would be great.

BigBird01 · 2020-10-07T17:30:56Z

Thanks! The DeBERTa may not be working correctly right now, knowing the source of the issue would be great.

The issue is due to the model failed to be loaded due to parameter name mismatch. It needs to update the model by changing the encoder name from 'bert' to 'deberta'.

BigBird01 · 2020-10-07T18:29:54Z

Thanks! The DeBERTa may not be working correctly right now, knowing the source of the issue would be great.

@LysandreJik I just made a fix to the test failure. #7645
Could you take a look?

julien-c added the model card Related to pretrained model cards label Jul 21, 2020

julien-c added model card Related to pretrained model cards New model and removed model card Related to pretrained model cards labels Jul 21, 2020

BigBird01 mentioned this pull request Jul 21, 2020

Add support for DeBERTa #4858

Closed

3 tasks

sshleifer reviewed Jul 21, 2020

View reviewed changes

sshleifer reviewed Aug 7, 2020

View reviewed changes

src/transformers/modeling_deberta.py Outdated Show resolved Hide resolved

src/transformers/modeling_deberta.py Outdated Show resolved Hide resolved

src/transformers/modeling_deberta.py Outdated Show resolved Hide resolved

stefan-it reviewed Aug 21, 2020

View reviewed changes

src/transformers/configuration_deberta.py Outdated Show resolved Hide resolved

stefan-it reviewed Aug 21, 2020

View reviewed changes

src/transformers/modeling_deberta.py Outdated Show resolved Hide resolved

stefan-it reviewed Aug 21, 2020

View reviewed changes

src/transformers/modeling_deberta.py Outdated Show resolved Hide resolved

stefan-it reviewed Aug 21, 2020

View reviewed changes

src/transformers/modeling_deberta.py Outdated Show resolved Hide resolved

stefan-it reviewed Aug 21, 2020

View reviewed changes

src/transformers/modeling_deberta.py Outdated Show resolved Hide resolved

stefan-it reviewed Aug 21, 2020

View reviewed changes

src/transformers/modeling_deberta.py Outdated Show resolved Hide resolved

stefan-it reviewed Aug 21, 2020

View reviewed changes

src/transformers/modeling_deberta.py Outdated Show resolved Hide resolved

stefan-it reviewed Aug 22, 2020

View reviewed changes

src/transformers/modeling_deberta.py Outdated Show resolved Hide resolved

stefan-it reviewed Aug 22, 2020

View reviewed changes

src/transformers/modeling_deberta.py Outdated Show resolved Hide resolved

stefan-it reviewed Aug 22, 2020

View reviewed changes

src/transformers/modeling_deberta.py Outdated Show resolved Hide resolved

stefan-it reviewed Aug 22, 2020

View reviewed changes

src/transformers/modeling_deberta.py Outdated Show resolved Hide resolved

LysandreJik self-requested a review September 3, 2020 09:09

LysandreJik reviewed Sep 8, 2020

View reviewed changes

patrickvonplaten reviewed Sep 29, 2020

View reviewed changes

sgugger approved these changes Sep 29, 2020

View reviewed changes

LysandreJik added 2 commits September 29, 2020 17:17

@patrickvonplaten's comments

5bef2ea

Not everything can be a copy

0e4b45b

LysandreJik reviewed Sep 29, 2020

View reviewed changes