Skip to content

Conversation

@lewtun
Copy link
Member

@lewtun lewtun commented Oct 10, 2022

What does this PR do?

This PR tweaks the modeling code of swin to enable dynamic batch sizes with the ONNX export. With this fix, the ONNX slow tests for this model now pass, including the slow tests for the original PyTorch model:

This passes
RUN_SLOW=1 pytest -x -sv tests/models/swin/test_modeling_swin.py
This also passes
RUN_SLOW=1 pytest -x -sv tests/onnx/test_onnx_v2.py -k "swin"

Since this change also impacts other models, I've also checked the modeling slow tests pass for:

  • maskformer
  • donut_swin
  • swin_v2

Related to #17476

@lewtun lewtun requested review from sgugger and ydshieh October 10, 2022 20:07
@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Oct 10, 2022

The documentation is not available anymore as the PR was closed or merged.

Copy link
Collaborator

@ydshieh ydshieh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @lewtun

Thank you for the fix ❤️ !

Is the issue caused by the changes in #19255? More precisely, from (newly added code)
https://github.com/dwyatte/transformers/blob/949683675d83cc38620106626822279cd45b076b/src/transformers/onnx/convert.py#L368

The error shows Outputs values doesn't match between reference model and ONNX exported model - it must be non-trivi
al to figure out this is coming from the shape things! How are you able to find out 💯 ? Is there some tool we can use to check things (tensor values/shape) when running onnx inference?

@ydshieh
Copy link
Collaborator

ydshieh commented Oct 11, 2022

(a bit off-topic, but still related question)

Viewing the issue and the fix provided this PR, I was thinking we would have a lot of the same errors due to this hard-coded batch size. However, when I check bert:

input_shape = input_ids.size()

and
batch_size, seq_length = input_shape

but the onnx tests still pass for bert. Are batch_size and seq_length not hard-coded here? Just wondering if @lewtun has already some insight regarding this.

@lewtun
Copy link
Member Author

lewtun commented Oct 11, 2022

Is the issue caused by the changes in #19255? More precisely, from (newly added code) https://github.com/dwyatte/transformers/blob/949683675d83cc38620106626822279cd45b076b/src/transformers/onnx/convert.py#L368

The error shows Outputs values doesn't match between reference model and ONNX exported model - it must be non-trivi al to figure out this is coming from the shape things! How are you able to find out 💯 ? Is there some tool we can use to check things (tensor values/shape) when running onnx inference?

Yes, this issue was surfaced by #19255, which implemented a stronger validation test on exported ONNX models. Basically, it generates the ONNX graph using dummy data with one batch size b, and then validates the forward pass with a different b'.

The reason it can be non-trivial to figure out when an export fails to have agreement between the PyTorch / ONNX models is that ONNX traces a graph based on dummy data, and this tracing can be incorrect if there are data-dependent flow statements (Swin in particular has a lot of these if/else statements). Currently , the best tool I know of is to visualise the graph with Netron and manually inspect for discrepancies.

Viewing the issue and the fix provided this PR, I was thinking we would have a lot of the same errors due to this hard-coded batch size. However, when I check bert:

I think in those cases we don't hit a problem because batch_size is only used to create the attention mask when none is provided. Since our dummy input provides an attention mask, this flow in the graph is never traced AFAICT

Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for working on this!

@lewtun lewtun merged commit b651efe into main Oct 11, 2022
@lewtun lewtun deleted the onnx-fix-swin branch October 11, 2022 13:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants