Skip to content

Conversation

@amyeroberts
Copy link
Contributor

@amyeroberts amyeroberts commented Dec 6, 2022

What does this PR do?

This PR addresses failing integration tests for the Donut image processor which involves four main changes:

  • Resolve bug where size wasn't passed to do_align_axis
  • Remove a bug in the get_resize_output_image_size function which wouldn't take account of max_size (inherited from previous resize without fixing)
  • Update logic for getting output size in thumbnail method - ensuring the image dimensions are never increased.
  • Update test values to reflect changes in resizing logic for thumbnail creation - see notes below.

Changing resizing logic for thumbnail method

The DonutFeatueExtractor used the Pillow thumbnail functionality to resize images which was replaced with reusing resize in the image_transforms library. This was done primarily as image.thumbnail modifies in place and uses Pillow's resize with some additional logic for calculating the output size. Unlike resize which will resize an image to the requested (height, width), thumbnail will produce an image which is no larger than the original image or requested size i.e. it will scale down an image preserving the aspect ratio c.f. Pillow docs.

This is a similar behaviour to torchvision when resizing:

  • the shortest image edge is resized to size (int for torchvision, min(requested_heigh, requsted_width) for Pillow)
  • the other edge is resized to preserve the aspect ratio
  • if the longest edge > max_size, the longest edge is resized to max_size and the shortest edge resized to preserve the aspect ratio.

The calculation of the other dimension to preserve the aspect ratio is slightly different between the libraries. In pytorch the length of the edge is found using int to round, whereas Pillow rounds to the value which produces an aspect ratio closest to the original image. The torchvision resizing logic is replicated in our image transforms library here.

In the test tests/models/vision_encoder_decoder/test_modeling_vision_encoder_decoder.py::DonutModelIntegrationTest::test_inference_docvqa, the input image to the thumbnail method has dimension (3713, 1920). The requested size is (2560, 1920). image.thumbnail will resize to (2560, 1373) and our resizing logic (matching torchvision) will resize to (2560, 1374).

As using torchvision resizing logic is more consistent with the rest of the library; Donut is the only model in the library that used the Pillow thumbnail functionality, and is more experimental than other models; I considered this to be an acceptable change.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Dec 6, 2022

The documentation is not available anymore as the PR was closed or merged.

@amyeroberts amyeroberts marked this pull request as ready for review December 6, 2022 19:48
Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix!

Copy link
Collaborator

@ydshieh ydshieh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @amyeroberts . LGTM, but I don't see any change related to

Resolve bug where size wasn't passed to do_align_axis

Do I miss anything?

@amyeroberts
Copy link
Contributor Author

Thanks @amyeroberts . LGTM, but I don't see any change related to

Resolve bug where size wasn't passed to do_align_axis

Do I miss anything?

Nope - I've pushed it now :)

@amyeroberts amyeroberts force-pushed the fix-donut-image-processor branch from 14aa2c5 to a96e89b Compare December 8, 2022 12:15
@amyeroberts
Copy link
Contributor Author

@sgugger @ydshieh This also uncovered another sneaky bug when resizing:

  • When resizing, the image is coverted to PIL.Image.Image from numpy. The channel dimension format of the input image, before resizing inferred is also inferred.
  • When the image is converted back to numpy the image is always in "ChannelDimension.LAST" format
  • A final to_channel_dimension_format call is made to make sure the output resized image is in the same channel dimension format as the input.
  • In to_channel_dimension_format the input image (resized in this case) channel dimension format is inferred and compared to the requested format.
  • If the height dimensions is of size 3 or 1, then format is incorrectly inferred as ChannelDimension.FIRST
  • This resulted in images in the incorrect format being returned after resizing

For practical purposes, this doesn't cause an issue as it's very unlikely an image has a height dimension of 3. However, it results in flaky tests and is a bug.

I've added an optional input_channel_dimension argument to to_channel_dimension_format which resolves this and additional tests for our resize functionality which previously failed and now pass with this update.

Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works for me, thanks for the deep cleaning!

Copy link
Collaborator

@ydshieh ydshieh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again

resized_image = to_channel_dimension_format(
resized_image, data_format, input_channel_dim=ChannelDimension.LAST
)
# resized_image = to_channel_dimension_format(resized_image, data_format)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's clean this line before merge :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done :)

@amyeroberts amyeroberts merged commit cf1b8c3 into huggingface:main Dec 8, 2022
@amyeroberts amyeroberts deleted the fix-donut-image-processor branch December 8, 2022 21:54
mpierrau pushed a commit to mpierrau/transformers that referenced this pull request Dec 15, 2022
* fix donut image processor

* Update test values

* Apply lower bound on resizing size

* Add in missing size param

* Resolve resize channel_dimension bug

* Update src/transformers/image_transforms.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants