Skip to content

Conversation

@WaterKnight1998
Copy link

What does this PR do?

This PR is a reimplementation of Vision Encoder Decoder Onnx conversion as a Seq2Seq Model like documentation explains: Encoder-decoder models inherit from OnnxSeq2SeqConfigWithPast

The PR #19254 didn't follow this classes. You have several examples on Repo on how to use it: https://github.com/huggingface/transformers/blob/v4.22.2/src/transformers/models/mbart/configuration_mbart.py

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

If you know how to use git blame, that is the easiest way, otherwise, here is a rough guide of who to tag.
Please tag fewer than 3 people.

@chainyo for OnnxConfigs

@lewtun & @sgugger for approving PR: #19254

mht-sharma and others added 3 commits October 10, 2022 21:55
…Seq Config.

Fixing issue.

trying to get this working.

Trying to fix.

Fixing issues.

More fixes.

Fixing error.

Solving more errors.

Trying to fix.

Fix

More fixes.

Debugging.

sdgsg

Formatting code.

aaaa
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

@lewtun
Copy link
Member

lewtun commented Oct 11, 2022

Hi @WaterKnight1998 thanks for your PR!

Indeed PR #19254 splits the model in separate encoder / decoder pieces, which differs to other seq2seq models that are currently implemented as a single ONNX graph. The main reason is the following:

  • To speed up the decoding process, it is more efficient to have a single pass through the encoder, followed by N passes through the decoder.
  • To support the caching of past key-value pairs, it is more efficient to have a separate decoder

Do you happen to have a latency benchmark for the VisionEncoderDecoder export that compares your PR vs the current implementation? I think this would be the axis on which we'd consider incorporating your changes, but I would be surprised if a single graph can beat the decomposed ones.

You can find more information in our optimum library, where we'll be implementing the pipeline for inference: https://huggingface.co/docs/optimum/onnxruntime/modeling_ort#export-and-inference-of-sequencetosequence-models

cc @mht-sharma re the original implementation

@WaterKnight1998
Copy link
Author

WaterKnight1998 commented Oct 11, 2022

Hi @WaterKnight1998 thanks for your PR!

Indeed PR #19254 splits the model in separate encoder / decoder pieces, which differs to other seq2seq models that are currently implemented as a single ONNX graph. The main reason is the following:

  • To speed up the decoding process, it is more efficient to have a single pass through the encoder, followed by N passes through the decoder.
  • To support the caching of past key-value pairs, it is more efficient to have a separate decoder

Do you happen to have a latency benchmark for the VisionEncoderDecoder export that compares your PR vs the current implementation? I think this would be the axis on which we'd consider incorporating your changes, but I would be surprised if a single graph can beat the decomposed ones.

I will try to create it, but I didn't take into account what you mention. It makes sense to just do a single pass in the encoder

You can find more information in our optimum library, where we'll be implementing the pipeline for inference: https://huggingface.co/docs/optimum/onnxruntime/modeling_ort#export-and-inference-of-sequencetosequence-models

I am looking forward for this implementation, I am thinking on running Donut model in production and it will be very cool, actually the inference times are pretty bad. I have an open PR for ONNX conversion: #19401

@WaterKnight1998
Copy link
Author

WaterKnight1998 commented Oct 11, 2022

@lewtun is this the pipeline that you mention: https://github.com/huggingface/optimum/blob/996f209147a466c7ecf5bfb29c9fd2e9831ea3a7/optimum/onnxruntime/modeling_seq2seq.py#L154?

@mht-sharma
Copy link
Contributor

@lewtun is this the pipeline that you mention: https://github.com/huggingface/optimum/blob/996f209147a466c7ecf5bfb29c9fd2e9831ea3a7/optimum/onnxruntime/modeling_seq2seq.py#L154?

Yes @WaterKnight1998, in this implementation the encoder and decoder part are exported separately and the inference is performed using ORT.

@github-actions
Copy link
Contributor

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@github-actions github-actions bot closed this Nov 18, 2022
@WaterKnight1998
Copy link
Author

@lewtun @sgugger please reopen it

@lewtun lewtun reopened this Nov 24, 2022
@github-actions
Copy link
Contributor

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@github-actions github-actions bot closed this Dec 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants