Fix donut image processor #20625

amyeroberts · 2022-12-06T17:39:25Z

What does this PR do?

This PR addresses failing integration tests for the Donut image processor which involves four main changes:

Resolve bug where size wasn't passed to do_align_axis
Remove a bug in the get_resize_output_image_size function which wouldn't take account of max_size (inherited from previous resize without fixing)
Update logic for getting output size in thumbnail method - ensuring the image dimensions are never increased.
Update test values to reflect changes in resizing logic for thumbnail creation - see notes below.

Changing resizing logic for `thumbnail` method

The DonutFeatueExtractor used the Pillow thumbnail functionality to resize images which was replaced with reusing resize in the image_transforms library. This was done primarily as image.thumbnail modifies in place and uses Pillow's resize with some additional logic for calculating the output size. Unlike resize which will resize an image to the requested (height, width), thumbnail will produce an image which is no larger than the original image or requested size i.e. it will scale down an image preserving the aspect ratio c.f. Pillow docs.

This is a similar behaviour to torchvision when resizing:

the shortest image edge is resized to size (int for torchvision, min(requested_heigh, requsted_width) for Pillow)
the other edge is resized to preserve the aspect ratio
if the longest edge > max_size, the longest edge is resized to max_size and the shortest edge resized to preserve the aspect ratio.

The calculation of the other dimension to preserve the aspect ratio is slightly different between the libraries. In pytorch the length of the edge is found using int to round, whereas Pillow rounds to the value which produces an aspect ratio closest to the original image. The torchvision resizing logic is replicated in our image transforms library here.

In the test tests/models/vision_encoder_decoder/test_modeling_vision_encoder_decoder.py::DonutModelIntegrationTest::test_inference_docvqa, the input image to the thumbnail method has dimension (3713, 1920). The requested size is (2560, 1920). image.thumbnail will resize to (2560, 1373) and our resizing logic (matching torchvision) will resize to (2560, 1374).

As using torchvision resizing logic is more consistent with the rest of the library; Donut is the only model in the library that used the Pillow thumbnail functionality, and is more experimental than other models; I considered this to be an acceptable change.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

HuggingFaceDocBuilderDev · 2022-12-06T17:52:57Z

The documentation is not available anymore as the PR was closed or merged.

sgugger

Thanks for the fix!

ydshieh

Thanks @amyeroberts . LGTM, but I don't see any change related to

Resolve bug where size wasn't passed to do_align_axis

Do I miss anything?

amyeroberts · 2022-12-08T10:43:57Z

Thanks @amyeroberts . LGTM, but I don't see any change related to
Resolve bug where size wasn't passed to do_align_axis
Do I miss anything?

Nope - I've pushed it now :)

amyeroberts · 2022-12-08T16:20:47Z

@sgugger @ydshieh This also uncovered another sneaky bug when resizing:

When resizing, the image is coverted to PIL.Image.Image from numpy. The channel dimension format of the input image, before resizing inferred is also inferred.
When the image is converted back to numpy the image is always in "ChannelDimension.LAST" format
A final to_channel_dimension_format call is made to make sure the output resized image is in the same channel dimension format as the input.
In to_channel_dimension_format the input image (resized in this case) channel dimension format is inferred and compared to the requested format.
If the height dimensions is of size 3 or 1, then format is incorrectly inferred as ChannelDimension.FIRST
This resulted in images in the incorrect format being returned after resizing

For practical purposes, this doesn't cause an issue as it's very unlikely an image has a height dimension of 3. However, it results in flaky tests and is a bug.

I've added an optional input_channel_dimension argument to to_channel_dimension_format which resolves this and additional tests for our resize functionality which previously failed and now pass with this update.

sgugger

Works for me, thanks for the deep cleaning!

ydshieh

Thanks again

ydshieh · 2022-12-08T16:56:28Z

src/transformers/image_transforms.py

+        resized_image = to_channel_dimension_format(
+            resized_image, data_format, input_channel_dim=ChannelDimension.LAST
+        )
+        # resized_image = to_channel_dimension_format(resized_image, data_format)


let's clean this line before merge :-)

src/transformers/image_transforms.py

* fix donut image processor * Update test values * Apply lower bound on resizing size * Add in missing size param * Resolve resize channel_dimension bug * Update src/transformers/image_transforms.py

amyeroberts marked this pull request as ready for review December 6, 2022 19:48

amyeroberts requested review from sgugger and ydshieh December 6, 2022 20:30

sgugger approved these changes Dec 6, 2022

View reviewed changes

ydshieh approved these changes Dec 7, 2022

View reviewed changes

amyeroberts added 4 commits December 8, 2022 12:15

fix donut image processor

3f9e374

Update test values

4ccfce8

Apply lower bound on resizing size

ba754c7

Add in missing size param

a96e89b

amyeroberts force-pushed the fix-donut-image-processor branch from 14aa2c5 to a96e89b Compare December 8, 2022 12:15

Resolve resize channel_dimension bug

e5ea8f8

sgugger approved these changes Dec 8, 2022

View reviewed changes

ydshieh approved these changes Dec 8, 2022

View reviewed changes

amyeroberts commented Dec 8, 2022

View reviewed changes

src/transformers/image_transforms.py Outdated Show resolved Hide resolved

Update src/transformers/image_transforms.py

a3dbfe8

amyeroberts merged commit cf1b8c3 into huggingface:main Dec 8, 2022

amyeroberts deleted the fix-donut-image-processor branch December 8, 2022 21:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix donut image processor #20625

Fix donut image processor #20625

Uh oh!

amyeroberts commented Dec 6, 2022 •

edited

Loading

Uh oh!

HuggingFaceDocBuilderDev commented Dec 6, 2022 •

edited

Loading

Uh oh!

sgugger left a comment

Uh oh!

ydshieh left a comment

Uh oh!

amyeroberts commented Dec 8, 2022

Uh oh!

amyeroberts commented Dec 8, 2022

Uh oh!

sgugger left a comment

Uh oh!

ydshieh left a comment

Uh oh!

ydshieh Dec 8, 2022

Uh oh!

amyeroberts Dec 8, 2022

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Fix donut image processor #20625

Fix donut image processor #20625

Uh oh!

Conversation

amyeroberts commented Dec 6, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Changing resizing logic for thumbnail method

Before submitting

Uh oh!

HuggingFaceDocBuilderDev commented Dec 6, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sgugger left a comment

Choose a reason for hiding this comment

Uh oh!

ydshieh left a comment

Choose a reason for hiding this comment

Uh oh!

amyeroberts commented Dec 8, 2022

Uh oh!

amyeroberts commented Dec 8, 2022

Uh oh!

sgugger left a comment

Choose a reason for hiding this comment

Uh oh!

ydshieh left a comment

Choose a reason for hiding this comment

Uh oh!

ydshieh Dec 8, 2022

Choose a reason for hiding this comment

Uh oh!

amyeroberts Dec 8, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

amyeroberts commented Dec 6, 2022 •

edited

Loading

Changing resizing logic for `thumbnail` method

HuggingFaceDocBuilderDev commented Dec 6, 2022 •

edited

Loading