Skip to content

Conversation

@lewtun
Copy link
Member

@lewtun lewtun commented Dec 1, 2021

What does this PR do?

This PR adds support to export MarianMT models in the ONNX format. The underlying logic builds on the awesome refactor / feature enhancement that @michaelbenayoun has implemented in #14358 & #14700 - we should rebase this branch on master once that PR is merged to simplify the diff in this PR. (Done)

Currently, this PR supports ONNX exports for the following "tasks" (i.e. uses):

  • default, default-with-past => equivalent to exporting a pretrained MarianModel
  • seq2seq-lm, seq2seq-lm-with-past => equivalent to exporting a pretrained MarianMTModel
  • causal-lm, causal-lm-with-past=> equivalent to exporting a pretrained MarianForCausalLM

Note that in each case, the end user will have to implement their own generate() method with the ONNX model - see this BART example for what's involved.

I've also checked locally that the "slow" tests pass with:

RUN_SLOW=1 pytest tests/test_onnx_v2.py -k "marian" -rp

Usage

Here's a quick example to show how this works:

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
from transformers.models.marian import MarianOnnxConfig

model_ckpt = "Helsinki-NLP/opus-mt-en-de"
tokenizer = AutoTokenizer.from_pretrained(model_ckpt)
ref_model = AutoModelForSeq2SeqLM.from_pretrained(model_ckpt)
# Export model
feature = "seq2seq-lm"
onnx_path = f"onnx/{model_ckpt}-{feature}/"
# Run this from a Jupyter notebook
!python -m transformers.onnx --model={model_ckpt} --atol=1e-4 --feature={feature} {onnx_path}
# Test export with inputs
batch_size = 4
encoder_inputs = tokenizer(
    ["Studies have been shown that owning a dog is good for you"] * batch_size,
    return_tensors="np",
)
decoder_inputs = tokenizer(
    ["Studien haben gezeigt dass es hilfreich ist einen Hund zu besitzen"]
    * batch_size,
    return_tensors="np",
)
all_inputs = {
    "input_ids": encoder_inputs["input_ids"],
    "attention_mask": encoder_inputs["attention_mask"],
    "decoder_input_ids": decoder_inputs["input_ids"],
    "decoder_attention_mask": decoder_inputs["attention_mask"],
}
# Generate ONNX outputs
ort_session = ort.InferenceSession(f"{onnx_path}model.onnx")
onnx_config = MarianOnnxConfig(ref_model.config, task=feature)
onnx_named_outputs = list(onnx_config.outputs.keys())
onnx_outputs = ort_session.run(onnx_named_outputs, all_inputs)

TODO

  • Extend support for language modelling head
  • Investigate range of numerical tolerance between raw and ONNX models for a range of checkpoints
  • Ensure that ONNX models are compatible with ONNX Runtime
  • Verify whether past key values are supported

Closes #13823, #13854

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@lewtun
Copy link
Member Author

lewtun commented Dec 8, 2021

There seems to be some sort of race condition happening in run_tests_torch:

_____________________________ ERROR collecting gw1 _____________________________
Different tests were collected between gw0 and gw1. The difference is:
--- gw0

+++ gw1

This issue has similar problems - perhaps a solution lies there.

- GPT Neo
- LayoutLM
- Longformer
- Marian
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure whether .rst files are still allowed with the new .mdx doc - does this need updating / changing?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Letting @LysandreJik answering this one.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw the Sylvain recently converted all the RST files to MDX, so I'll rebase and this file should disappear :)

@lewtun lewtun marked this pull request as ready for review December 22, 2021 14:11
self._setup_normalizer()

def num_special_tokens_to_add(self, **unused):
def num_special_tokens_to_add(self, *args, **kwargs):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is required to accommodate the use of positional arguments like tokenizer.num_special_tokens_to_add(is_pair) in _generate_dummy_inputs_for_sequence_classification_and_question_answering().

I'm not sure why we had **unused in the first place, but the change also seems more conventional IMO.

]
return common_inputs

def _generate_dummy_inputs_for_sequence_classification_and_question_answering(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically, Marian doesn't have heads for sequence classification or question answering and this function is here due to the copy-paste from the BART config.

If you think this is confusing, I can remove this function and refactor the other dummy generation functions accordingly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it can be done, you'll just have to remove the # Copied from comment at the top of the class declaration.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could remove the # Copied from which is at the top of the class declaration and add it only to methods. It supports methods as well as classes.

)


# Copied from transformers.models.bart.configuration_bart.BartOnnxConfig with Bart->Marian
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the Marian model is copied from BART (see modeling_marian.py), I adopted a similar approach for the ONNX config.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, nice!

- GPT Neo
- LayoutLM
- Longformer
- Marian
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Letting @LysandreJik answering this one.

]
return common_inputs

def _generate_dummy_inputs_for_sequence_classification_and_question_answering(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it can be done, you'll just have to remove the # Copied from comment at the top of the class declaration.

Copy link
Member

@LysandreJik LysandreJik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thank you @lewtun!

)


# Copied from transformers.models.bart.configuration_bart.BartOnnxConfig with Bart->Marian
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, nice!

]
return common_inputs

def _generate_dummy_inputs_for_sequence_classification_and_question_answering(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could remove the # Copied from which is at the top of the class declaration and add it only to methods. It supports methods as well as classes.

@LysandreJik
Copy link
Member

Feel free to merge once you have taken care of the docs and the # Copied from statements :)

]
return common_inputs

def _generate_dummy_inputs_for_encoder_and_decoder(
Copy link
Member Author

@lewtun lewtun Dec 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I renamed this function from _generate_dummy_inputs_for_sequence_classification_and_question_answering() to something that closer reflects its usage in the other dummy input functions.

As noted earlier, Marian models don't have sequence classification or question answering heads, so this change is aimed at minimizing confusion for those inspecting the source code.

@lewtun
Copy link
Member Author

lewtun commented Dec 23, 2021

Thanks for the reviews @LysandreJik and @michaelbenayoun 🙏 !

I've fixed the docs by rebasing on master and added the # Copied from: snippets to the functions (I did not know about that trick!)

Will merge once all the test pass :)

@chaodreaming
Copy link

The outputs decoding cannot get the correct result. How do you get the translation result

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add MarianMT to models exportable with ONNX

5 participants