Add X-MOD #20939

jvamvas · 2022-12-29T16:24:31Z

Add the X-MOD models released with the paper Lifting the Curse of Multilinguality by Pre-training Modular Transformers.

Implementation notes

There are nine pre-trained models released in the fairseq repo: https://github.com/facebookresearch/fairseq/tree/main/examples/xmod. I will upload them under my own name and they can be moved to the facebook organization after merging.
The model code can be adapted from XLM-RoBERTa. Separate code is required due to the language adapters and the pre-norm.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline, Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the documentation guidelines, and here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

text models: @ArthurZucker and @younesbelkada

HuggingFaceDocBuilderDev · 2022-12-29T16:39:21Z

The documentation is not available anymore as the PR was closed or merged.

jvamvas · 2023-01-02T09:07:54Z

This PR is now ready for review.

Uploaded models:

younesbelkada

This looks super clean to me! Thanks a lot for your huge work and adding all those models!
I left a couple of comments, mostly nits and open questions!
We should be really close merging this!

README_ja.md

docs/source/en/model_doc/xmod.mdx

src/transformers/models/xmod/configuration_xmod.py

src/transformers/models/xmod/convert_xmod_original_pytorch_checkpoint_to_pytorch.py

src/transformers/models/xmod/modeling_xmod.py

jvamvas · 2023-01-24T10:35:20Z

@younesbelkada Thank you for the swift code review, much appreciated!
I have now implemented your comments.

younesbelkada

This looks great to me! Thanks for your work on this!
handling now the PR to @ArthurZucker & @sgugger for final approvals and reviews!

sgugger

Thanks a lot for adding this new model! My two main comments are around naming (see first comment below but please switch all XMOD to Xmod in class names) and type annotations. While we welcome them in signatures, in the code itself they are usually redundant if names are aptly chosen, and we don't use the, in the rest of the Transformers codebase.

docs/source/en/model_doc/xmod.mdx

src/transformers/models/xmod/convert_xmod_original_pytorch_checkpoint_to_pytorch.py

src/transformers/models/xmod/modeling_xmod.py

jvamvas · 2023-01-25T07:06:09Z

@sgugger Thanks for the review. Your suggestions have now been implemented

ArthurZucker

Very clean overall! Some tests are missing, but overall very impressed by the good use of copied from! 😉

src/transformers/models/xmod/configuration_xmod.py

ArthurZucker · 2023-01-25T14:45:13Z

src/transformers/models/xmod/configuration_xmod.py

Since the models are from META, we should probably move them and update this before merging

src/transformers/models/xmod/convert_xmod_original_pytorch_checkpoint_to_pytorch.py

ArthurZucker · 2023-01-25T14:49:59Z

src/transformers/models/xmod/convert_xmod_original_pytorch_checkpoint_to_pytorch.py

src/transformers/models/xmod/modeling_xmod.py

ArthurZucker · 2023-01-25T14:54:23Z

src/transformers/models/xmod/modeling_xmod.py

Since it is not really an Output but rather a projection, we could have renamed this, but I suppose it follow what was done in roberta

ArthurZucker · 2023-01-25T14:54:50Z

src/transformers/models/xmod/modeling_xmod.py

Same here, we could remove this strange variable

src/transformers/models/xmod/modeling_xmod.py

ArthurZucker · 2023-01-25T15:13:32Z

tests/models/xmod/test_modeling_xmod.py

A few tests are missing with example of MaskedLM, XmodForCausalLM etc. This could make sure that they also work. Very simple testing is enough, but would be great if all the pipelines work with the different variations of the model.

This file already contains tests for XmodForMaskedLM, XmodForCausalLM etc.
The test coverage is the same as with other XLM-based models, e.g. xlm_roberta_xl.

Yes and no! If they were not in the previous codes, then we are lucky that it works!
What I am suggesting here is just adding simple tests to check that model.generate() has the expected behaviour with respect to the original code (so as part of the integration tests).

You don't have to use the original codebase to compare the examples, but at least having a correct generation tests will prevent us from doing bad modification to the generate() function that will not be seen by current tests! This is valide for various model 😉

Thanks for clarifying.
I have now added an integration test for FillMaskPipeline.

I hesitate to check the output of other pipelines because there are no trained models for those pipelines. Checking the output of randomly initialized models amounts to a guarantee that different versions of transformers perform identical initialization. Such a guarantee would be out of the scope of this PR and should be discussed in a separate issue.

It is perfect. If there are not pretrained model, it is not needed to test the pipelines !

ArthurZucker · 2023-01-25T15:18:55Z

Can you also add the model to the documentation_tests.txt file to and run the doctests to be sure that they are valid?

jvamvas · 2023-01-31T07:13:57Z

@ArthurZucker Thanks for the code review. I have now implemented the changes you requested.

I agree that the models should be moved to the facebook organization but do not have the permissions to do so.

ArthurZucker · 2023-01-31T10:50:07Z

About moving the weights, I think I am in the org, and can help with that / ask to add you to transfer them 😉
Looks very good, almost there! 🚀

ArthurZucker · 2023-02-01T11:36:39Z

tests/models/xmod/test_modeling_xmod.py

If it is a pipeline test, let's put it in the test_pipeline_fill_mask file.
Its cool that you added it. I was mentioning something more simpler with juste model.generate()

jvamvas · 2023-02-06T12:41:25Z

Hi @ArthurZucker, thanks for pointing out that there are missing tests in this PR.
Unfortunately, I have not been able to figure out which tests are missing, exactly.

As of now, there are the following tests:

tests.models.xmod.test_modeling_xmod.XmodModelTest – checks that there are no errors when calling the methods of XmodFor..., including model.generate()
tests.models.xmod.test_modeling_xmod.XmodModelIntegrationTest – checks that the output of the pre-trained models jvamvas/xmod-base and jvamvas/xmod-large-prenorm is identical to the corresponding Fairseq models.

Could you please clarify which tests need to be added still?

ArthurZucker · 2023-02-06T12:49:42Z

Hey! Thanks for bearing with me.

What is there but should not: a pipeline test inside the test_modeling file
The missing tests :
Something like what we have in opt , which will be part of the tests.models.xmod.test_modeling_xmod.XmodModelIntegrationTest. You can also have a class XmodGenerationTest(unittest.TestCase):
A sample test is the following.

    def test_batch_generation(self):
        model_id = "facebook/opt-350m"

        tokenizer = GPT2Tokenizer.from_pretrained(model_id)
        model = OPTForCausalLM.from_pretrained(model_id)
        model.to(torch_device)

        tokenizer.padding_side = "left"

        # use different length sentences to test batching
        sentences = [
            "Hello, my dog is a little",
            "Today, I",
        ]

        inputs = tokenizer(sentences, return_tensors="pt", padding=True)
        input_ids = inputs["input_ids"].to(torch_device)

        outputs = model.generate(
            input_ids=input_ids,
            attention_mask=inputs["attention_mask"].to(torch_device),
        )

        inputs_non_padded = tokenizer(sentences[0], return_tensors="pt").input_ids.to(torch_device)
        output_non_padded = model.generate(input_ids=inputs_non_padded)

        num_paddings = inputs_non_padded.shape[-1] - inputs["attention_mask"][-1].long().sum().cpu().item()
        inputs_padded = tokenizer(sentences[1], return_tensors="pt").input_ids.to(torch_device)
        output_padded = model.generate(input_ids=inputs_padded, max_length=model.config.max_length - num_paddings)

        batch_out_sentence = tokenizer.batch_decode(outputs, skip_special_tokens=True)
        non_padded_sentence = tokenizer.decode(output_non_padded[0], skip_special_tokens=True)
        padded_sentence = tokenizer.decode(output_padded[0], skip_special_tokens=True)

        expected_output_sentence = [
            "Hello, my dog is a little bit of a dork.\nI'm a little bit",
            "Today, I was in the middle of a conversation with a friend about the",
        ]
        self.assertListEqual(expected_output_sentence, batch_out_sentence)
        self.assertListEqual(batch_out_sentence, [non_padded_sentence, padded_sentence])

Does that make sense? 😉

ArthurZucker · 2023-02-06T14:08:03Z

The CI tests are broken but it is not your fault ! We are going to have to wait until the basic docker properly runs, but the added test looks good 😉

younesbelkada · 2023-02-09T10:39:13Z

hi @jvamvas !
For the code quality tests just need to rebase with main and run:

pip install --upgrade -e .["quality"]

Then run the usual make style or make fixup

jvamvas · 2023-02-09T14:04:38Z

@younesbelkada Sorry about the bad rebase. On the plus side, the tests are now passing again 🎉

ArthurZucker · 2023-02-09T16:57:27Z

Yeah hahah. Do you think you can reset, then rebase instead of merge? 😉

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

This reverts commit 4381eb3b1d0f5d85785f89caba83928e6efa6d1f.

jvamvas · 2023-02-10T11:11:31Z

@ArthurZucker Done. The failing test is not related to this PR

ArthurZucker · 2023-02-10T14:36:40Z

Great work! Thanks for working on this model! 🥳

jvamvas marked this pull request as draft December 29, 2022 16:58

jvamvas changed the title ~~[WIP] Add X-MOD~~ Add X-MOD Jan 2, 2023

jvamvas marked this pull request as ready for review January 2, 2023 09:07

younesbelkada reviewed Jan 20, 2023

View reviewed changes

younesbelkada approved these changes Jan 24, 2023

View reviewed changes

younesbelkada requested review from ArthurZucker and sgugger January 24, 2023 10:45

sgugger reviewed Jan 24, 2023

View reviewed changes

ArthurZucker reviewed Jan 25, 2023

View reviewed changes

ArthurZucker reviewed Feb 1, 2023

View reviewed changes

jvamvas and others added 7 commits February 9, 2023 18:34

Add X-MOD to Readme

bf2c3fa

Add documentation for X-MOD

03d7604

Implement X-MOD

2ae4baa

Fix formatting of X-MOD docs

e3a864a

Change signature of X-MOD forward methods to use lang_ids

7b56f9a

Minor changes

072181c

Rebase with main and run make fix-copies

25a415c

jvamvas and others added 14 commits February 9, 2023 18:34

Make suggested changes to docstrings

9b4c736

Improve code readability

20c1250

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

Fix code style

e4e0815

Conversion script: Remove asserts and type annotations

22c5778

Remove _TOKENIZER_FOR_DOC

fcd0243

XMOD -> Xmod

de4dee0

Update copyright note

2d894a2

Fix doctests

731a116

Fix docstring

2033119

Add integration test for FillMaskPipeline

7ac1d07

Revert "Add integration test for FillMaskPipeline"

b3c7f8b

This reverts commit 4381eb3b1d0f5d85785f89caba83928e6efa6d1f.

Add end-to-end integration test for mask fill

140d3d8

make style

444d5e1

Rebase with main and make fix-copies

7b4550f

ArthurZucker merged commit b0d539c into huggingface:main Feb 10, 2023

younesbelkada mentioned this pull request Feb 14, 2023

Error (also in original) model, scaling only q matrix not qk.T dot product (qk.T/sqrt(dim_per_head)) #21627

Merged

5 tasks

jvamvas mentioned this pull request Feb 15, 2023

[WIP] Move X-MOD models to facebook organization #21640

Merged

9 tasks

Add X-MOD #20939

Add X-MOD #20939

Uh oh!

Conversation

jvamvas commented Dec 29, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Implementation notes

Before submitting

Who can review?

Uh oh!

HuggingFaceDocBuilderDev commented Dec 29, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jvamvas commented Jan 2, 2023

Uh oh!

younesbelkada left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jvamvas commented Jan 24, 2023

Uh oh!

younesbelkada left a comment

Choose a reason for hiding this comment

Uh oh!

sgugger left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jvamvas commented Jan 25, 2023

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ArthurZucker commented Jan 25, 2023

Uh oh!

jvamvas commented Jan 31, 2023

Uh oh!

ArthurZucker commented Jan 31, 2023

Uh oh!

Choose a reason for hiding this comment

jvamvas commented Dec 29, 2022 •

edited

Loading

HuggingFaceDocBuilderDev commented Dec 29, 2022 •

edited

Loading

younesbelkada commented Feb 9, 2023 •

edited

Loading

ArthurZucker commented Feb 9, 2023 •

edited

Loading