-
Notifications
You must be signed in to change notification settings - Fork 31.6k
Fix donut image processor #20625
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix donut image processor #20625
Conversation
|
The documentation is not available anymore as the PR was closed or merged. |
sgugger
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fix!
ydshieh
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @amyeroberts . LGTM, but I don't see any change related to
Resolve bug where size wasn't passed to do_align_axis
Do I miss anything?
Nope - I've pushed it now :) |
14aa2c5 to
a96e89b
Compare
|
@sgugger @ydshieh This also uncovered another sneaky bug when resizing:
For practical purposes, this doesn't cause an issue as it's very unlikely an image has a height dimension of 3. However, it results in flaky tests and is a bug. I've added an optional |
sgugger
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Works for me, thanks for the deep cleaning!
ydshieh
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again
src/transformers/image_transforms.py
Outdated
| resized_image = to_channel_dimension_format( | ||
| resized_image, data_format, input_channel_dim=ChannelDimension.LAST | ||
| ) | ||
| # resized_image = to_channel_dimension_format(resized_image, data_format) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's clean this line before merge :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done :)
* fix donut image processor * Update test values * Apply lower bound on resizing size * Add in missing size param * Resolve resize channel_dimension bug * Update src/transformers/image_transforms.py
What does this PR do?
This PR addresses failing integration tests for the Donut image processor which involves four main changes:
sizewasn't passed todo_align_axisget_resize_output_image_sizefunction which wouldn't take account ofmax_size(inherited from previous resize without fixing)thumbnailmethod - ensuring the image dimensions are never increased.Changing resizing logic for
thumbnailmethodThe DonutFeatueExtractor used the Pillow thumbnail functionality to resize images which was replaced with reusing
resizein the image_transforms library. This was done primarily asimage.thumbnailmodifies in place and uses Pillow's resize with some additional logic for calculating the output size. Unlikeresizewhich will resize an image to the requested(height, width),thumbnailwill produce an image which is no larger than the original image or requested size i.e. it will scale down an image preserving the aspect ratio c.f. Pillow docs.This is a similar behaviour to torchvision when resizing:
size(int for torchvision,min(requested_heigh, requsted_width)for Pillow)max_size, the longest edge is resized tomax_sizeand the shortest edge resized to preserve the aspect ratio.The calculation of the other dimension to preserve the aspect ratio is slightly different between the libraries. In pytorch the length of the edge is found using
intto round, whereas Pillow rounds to the value which produces an aspect ratio closest to the original image. The torchvision resizing logic is replicated in our image transforms library here.In the test
tests/models/vision_encoder_decoder/test_modeling_vision_encoder_decoder.py::DonutModelIntegrationTest::test_inference_docvqa, the input image to thethumbnailmethod has dimension(3713, 1920). The requested size is(2560, 1920).image.thumbnailwill resize to(2560, 1373)and our resizing logic (matching torchvision) will resize to(2560, 1374).As using torchvision resizing logic is more consistent with the rest of the library; Donut is the only model in the library that used the Pillow thumbnail functionality, and is more experimental than other models; I considered this to be an acceptable change.
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.