-
Notifications
You must be signed in to change notification settings - Fork 31.9k
Add X-MOD #20939
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add X-MOD #20939
Conversation
|
The documentation is not available anymore as the PR was closed or merged. |
younesbelkada
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks super clean to me! Thanks a lot for your huge work and adding all those models!
I left a couple of comments, mostly nits and open questions!
We should be really close merging this!
src/transformers/models/xmod/convert_xmod_original_pytorch_checkpoint_to_pytorch.py
Outdated
Show resolved
Hide resolved
|
@younesbelkada Thank you for the swift code review, much appreciated! |
younesbelkada
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great to me! Thanks for your work on this!
handling now the PR to @ArthurZucker & @sgugger for final approvals and reviews!
sgugger
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for adding this new model! My two main comments are around naming (see first comment below but please switch all XMOD to Xmod in class names) and type annotations. While we welcome them in signatures, in the code itself they are usually redundant if names are aptly chosen, and we don't use the, in the rest of the Transformers codebase.
src/transformers/models/xmod/convert_xmod_original_pytorch_checkpoint_to_pytorch.py
Outdated
Show resolved
Hide resolved
src/transformers/models/xmod/convert_xmod_original_pytorch_checkpoint_to_pytorch.py
Outdated
Show resolved
Hide resolved
src/transformers/models/xmod/convert_xmod_original_pytorch_checkpoint_to_pytorch.py
Outdated
Show resolved
Hide resolved
src/transformers/models/xmod/convert_xmod_original_pytorch_checkpoint_to_pytorch.py
Outdated
Show resolved
Hide resolved
|
@sgugger Thanks for the review. Your suggestions have now been implemented |
ArthurZucker
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very clean overall! Some tests are missing, but overall very impressed by the good use of copied from! 😉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the models are from META, we should probably move them and update this before merging
src/transformers/models/xmod/convert_xmod_original_pytorch_checkpoint_to_pytorch.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice 😉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since it is not really an Output but rather a projection, we could have renamed this, but I suppose it follow what was done in roberta
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, we could remove this strange variable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few tests are missing with example of MaskedLM, XmodForCausalLM etc. This could make sure that they also work. Very simple testing is enough, but would be great if all the pipelines work with the different variations of the model.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file already contains tests for XmodForMaskedLM, XmodForCausalLM etc.
The test coverage is the same as with other XLM-based models, e.g. xlm_roberta_xl.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes and no! If they were not in the previous codes, then we are lucky that it works!
What I am suggesting here is just adding simple tests to check that model.generate() has the expected behaviour with respect to the original code (so as part of the integration tests).
You don't have to use the original codebase to compare the examples, but at least having a correct generation tests will prevent us from doing bad modification to the generate() function that will not be seen by current tests! This is valide for various model 😉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for clarifying.
I have now added an integration test for FillMaskPipeline.
I hesitate to check the output of other pipelines because there are no trained models for those pipelines. Checking the output of randomly initialized models amounts to a guarantee that different versions of transformers perform identical initialization. Such a guarantee would be out of the scope of this PR and should be discussed in a separate issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is perfect. If there are not pretrained model, it is not needed to test the pipelines !
|
Can you also add the model to the |
|
@ArthurZucker Thanks for the code review. I have now implemented the changes you requested. I agree that the models should be moved to the facebook organization but do not have the permissions to do so. |
|
About moving the weights, I think I am in the org, and can help with that / ask to add you to transfer them 😉 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it is a pipeline test, let's put it in the test_pipeline_fill_mask file.
Its cool that you added it. I was mentioning something more simpler with juste model.generate()
|
Hi @ArthurZucker, thanks for pointing out that there are missing tests in this PR. As of now, there are the following tests:
Could you please clarify which tests need to be added still? |
|
Hey! Thanks for bearing with me.
def test_batch_generation(self):
model_id = "facebook/opt-350m"
tokenizer = GPT2Tokenizer.from_pretrained(model_id)
model = OPTForCausalLM.from_pretrained(model_id)
model.to(torch_device)
tokenizer.padding_side = "left"
# use different length sentences to test batching
sentences = [
"Hello, my dog is a little",
"Today, I",
]
inputs = tokenizer(sentences, return_tensors="pt", padding=True)
input_ids = inputs["input_ids"].to(torch_device)
outputs = model.generate(
input_ids=input_ids,
attention_mask=inputs["attention_mask"].to(torch_device),
)
inputs_non_padded = tokenizer(sentences[0], return_tensors="pt").input_ids.to(torch_device)
output_non_padded = model.generate(input_ids=inputs_non_padded)
num_paddings = inputs_non_padded.shape[-1] - inputs["attention_mask"][-1].long().sum().cpu().item()
inputs_padded = tokenizer(sentences[1], return_tensors="pt").input_ids.to(torch_device)
output_padded = model.generate(input_ids=inputs_padded, max_length=model.config.max_length - num_paddings)
batch_out_sentence = tokenizer.batch_decode(outputs, skip_special_tokens=True)
non_padded_sentence = tokenizer.decode(output_non_padded[0], skip_special_tokens=True)
padded_sentence = tokenizer.decode(output_padded[0], skip_special_tokens=True)
expected_output_sentence = [
"Hello, my dog is a little bit of a dork.\nI'm a little bit",
"Today, I was in the middle of a conversation with a friend about the",
]
self.assertListEqual(expected_output_sentence, batch_out_sentence)
self.assertListEqual(batch_out_sentence, [non_padded_sentence, padded_sentence])Does that make sense? 😉 |
|
The CI tests are broken but it is not your fault ! We are going to have to wait until the basic docker properly runs, but the added test looks good 😉 |
|
hi @jvamvas ! Then run the usual |
|
@younesbelkada Sorry about the bad rebase. On the plus side, the tests are now passing again 🎉 |
|
Yeah hahah. Do you think you can reset, then rebase instead of merge? 😉 |
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
This reverts commit 4381eb3b1d0f5d85785f89caba83928e6efa6d1f.
|
@ArthurZucker Done. The failing test is not related to this PR |
|
Great work! Thanks for working on this model! 🥳 |
Add the X-MOD models released with the paper Lifting the Curse of Multilinguality by Pre-training Modular Transformers.
Implementation notes
Before submitting
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.