forked from huggingface/transformers
-
Notifications
You must be signed in to change notification settings - Fork 0
BaseImageProcessor #26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
62 commits
Select commit
Hold shift + click to select a range
54aed8b
Base processor skeleton
amyeroberts ba55c89
BatchFeature for packaging image processor outputs
amyeroberts 4b430d4
Initial image processor for GLPN
amyeroberts b1c8b59
REmove accidental import
amyeroberts b9ce4a0
Import BatchFeature from feature_extraction_utils
amyeroberts 6b678fb
Fixup and docs
amyeroberts db93437
Fixup and docs
amyeroberts bd890d5
Fixup and docs
amyeroberts 4b27a34
Fixup and docs
amyeroberts ff0d49e
BatchFeature for packaging image processor outputs
amyeroberts 2c2fa9a
Import BatchFeature from feature_extraction_utils
amyeroberts b9f7837
Merge branch 'image-processor-mixin' into base-image-processor-class
amyeroberts 346270d
Resolve conflicts
amyeroberts 7faf2e6
Import BatchFeature from feature_extraction_utils
amyeroberts ccc15fb
Fixup and docs
amyeroberts c8f8eb6
Fixup and docs
amyeroberts 90093f4
BatchFeature for packaging image processor outputs
amyeroberts d89c051
Import BatchFeature from feature_extraction_utils
amyeroberts 9bc9157
Fixup and docs
amyeroberts 6ec382a
Mixin for saving the image processor
amyeroberts 56ee6ad
Fixup and docs
amyeroberts 38ebb50
Merge branch 'image-batch-feature' into image-processor-glpn
amyeroberts 6b88d5f
Add rescale back and remove ImageType
amyeroberts 67077f1
fix import mistake
amyeroberts fb6438c
Merge branch 'image-processor-mixin' into base-image-processor-class
amyeroberts ffe71b6
Merge branch 'base-image-processor-class' into image-batch-feature
amyeroberts cc480e8
Merge branch 'image-batch-feature' into image-processor-glpn
amyeroberts 4264d1a
Merge branch 'image-processor-mixin' into base-image-processor-class
amyeroberts fb5dcd6
Merge in branch and remove conflicts
amyeroberts 43f561d
Add in rescaling
amyeroberts 60c56e5
Data format flag for rescale
amyeroberts 9294dbc
Fix typo
amyeroberts 936de65
Merge branch 'image-processor-mixin' into base-image-processor-class
amyeroberts 627c048
Merge branch 'base-image-processor-class' into image-batch-feature
amyeroberts 1b64c80
Merge branch 'image-batch-feature' into image-processor-glpn
amyeroberts 88b82e9
Fixes to make IP and FE outputs match
amyeroberts 2117b94
Merge branch 'image-processor-mixin' into base-image-processor-class
amyeroberts 68de952
Resole merge conflicts
amyeroberts 5208680
Merge branch 'image-batch-feature' into image-processor-glpn
amyeroberts 9514d54
Merge branch 'image-processor-mixin' into base-image-processor-class
amyeroberts 8f63b76
Merge branch 'base-image-processor-class' into image-batch-feature
amyeroberts 46a9c74
Merge branch 'image-batch-feature' into image-processor-glpn
amyeroberts 082e4ff
Remove default to numpy batching
amyeroberts bf73358
Fix up
amyeroberts 34b6b2f
Add docstring and model_input_types
amyeroberts 8678c13
Merge branch 'image-processor-mixin' into base-image-processor-class
amyeroberts 937884c
Merge branch 'base-image-processor-class' into image-batch-feature
amyeroberts a1b681a
Merge branch 'image-batch-feature' into image-processor-glpn
amyeroberts 952c2a0
Resolve merge conflicts
amyeroberts 2f0fa0b
Resolve merge conflicts
amyeroberts e6233cc
Resolve merge conflicts
amyeroberts bd0afd6
Merge branch 'image-processor-mixin' into base-image-processor-class
amyeroberts a6f69bc
Merge branch 'base-image-processor-class' into image-batch-feature
amyeroberts a7af81f
Merge and resolve conflicts
amyeroberts b66d0f6
Merge branch 'image-processor-mixin' into base-image-processor-class
amyeroberts 8b73f89
Merge branch 'base-image-processor-class' into image-batch-feature
amyeroberts ae6030c
Merge branch 'image-batch-feature' into image-processor-glpn
amyeroberts 7a4d22a
Fix up
amyeroberts 790c2c6
Apply suggestions from code review
amyeroberts 2e929cf
Update src/transformers/image_transforms.py
amyeroberts ae35873
Add in docstrings
amyeroberts 4fff267
Merge pull request #23 from amyeroberts/image-processor-glpn
amyeroberts File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,180 @@ | ||
| # coding=utf-8 | ||
| # Copyright 2022 The HuggingFace Inc. team. All rights reserved. | ||
| # | ||
| # Licensed under the Apache License, Version 2.0 (the "License"); | ||
| # you may not use this file except in compliance with the License. | ||
| # You may obtain a copy of the License at | ||
| # | ||
| # http://www.apache.org/licenses/LICENSE-2.0 | ||
| # | ||
| # Unless required by applicable law or agreed to in writing, software | ||
| # distributed under the License is distributed on an "AS IS" BASIS, | ||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| # See the License for the specific language governing permissions and | ||
| # limitations under the License. | ||
| """Image processor class for GLPN.""" | ||
|
|
||
| from typing import List, Optional, Union | ||
|
|
||
| import numpy as np | ||
| import PIL.Image | ||
|
|
||
| from transformers.utils.generic import TensorType | ||
|
|
||
| from ...image_processing_utils import BaseImageProcessor, BatchFeature | ||
| from ...image_transforms import rescale, resize, to_channel_dimension_format | ||
| from ...image_utils import ChannelDimension, get_image_size, is_batched, to_numpy_array, valid_images | ||
| from ...utils import logging | ||
|
|
||
|
|
||
| logger = logging.get_logger(__name__) | ||
|
|
||
|
|
||
| class GLPNImageProcessor(BaseImageProcessor): | ||
| r""" | ||
| Constructs a GLPN image processor. | ||
|
|
||
| Args: | ||
| do_resize (`bool`, *optional*, defaults to `True`): | ||
| Set the class default for the `do_resize` parameter. Controls whether to resize the image's (height, width) | ||
| dimensions, rounding them down to the closest multiple of `size_divisor`. | ||
| do_rescale (`bool`, *optional*, defaults to `True`): | ||
| Set the class default for the `do_rescale` parameter. Controls whether or not to apply the scaling factor | ||
| (to make pixel values floats between 0. and 1.). | ||
| size_divisor (`int`, *optional*, defaults to 32): | ||
| Set the class default for the `size_divisor` parameter. When `do_resize` is `True`, images are resized so | ||
| their height and width are rounded down to the closest multiple of `size_divisor`. | ||
| resample (`PIL.Image.Resampling`, *optional*, defaults to `PIL.Image.Resampling.BILINEAR`): | ||
| Set the class default for `resample`. Defines the resampling filter to use if resizing the image. | ||
| """ | ||
|
|
||
| model_input_names = ["pixel_values"] | ||
|
|
||
| def __init__( | ||
| self, do_resize=True, do_rescale=True, size_divisor=32, resample=PIL.Image.Resampling.BILINEAR, **kwargs | ||
| ) -> None: | ||
| self.do_resize = do_resize | ||
| self.do_rescale = do_rescale | ||
| self.size_divisor = size_divisor | ||
| self.resample = resample | ||
| super().__init__(**kwargs) | ||
|
|
||
| def resize( | ||
| self, | ||
| image: np.ndarray, | ||
| size_divisor: int, | ||
| resample: PIL.Image.Resampling, | ||
| data_format: Optional[ChannelDimension] = None, | ||
| **kwargs | ||
| ) -> np.ndarray: | ||
| """ | ||
| Resize the image, rounding the (height, width) dimensions down to the closest multiple of size_divisor. | ||
|
|
||
| If the image is of dimension (3, 260, 170) and size_divisor is 32, the image will be resized to (3, 256, 160). | ||
|
|
||
| Args: | ||
| image (`np.ndarray`): | ||
| The image to resize. | ||
| size_divisor (`int`): | ||
| The image is resized so its height and width are rounded down to the closest multiple of | ||
| `size_divisor`. | ||
| resample (`PIL.Image.Resampling`): | ||
| Resampling filter to use when resizing the image. | ||
| data_format (`ChannelDimension`, *optional*): | ||
| The channel dimension format for the output image. If `None`, the channel dimension format of the input | ||
| image is used. Can be one of: | ||
| - `ChannelDimension.FIRST`: image in (num_channels, height, width) format. | ||
| - `ChannelDimension.LAST`: image in (height, width, num_channels) format. | ||
| """ | ||
| height, width = get_image_size(image) | ||
| # Rounds the height and width down to the closest multiple of size_divisor | ||
| new_h = height // size_divisor * size_divisor | ||
| new_w = width // size_divisor * size_divisor | ||
| image = resize(image, (new_h, new_w), resample=resample, data_format=data_format, **kwargs) | ||
| return image | ||
|
|
||
| def rescale( | ||
| self, image: np.ndarray, scale: Union[int, float], data_format: Optional[ChannelDimension] = None, **kwargs | ||
| ) -> np.ndarray: | ||
| """ | ||
| Rescale the image by the given scaling factor `scale`. | ||
|
|
||
| Args: | ||
| image (`np.ndarray`): | ||
| The image to rescale. | ||
| scale (`int` or `float`): | ||
| The scaling factor to rescale pixel values by. | ||
| data_format (`ChannelDimension`, *optional*): | ||
| The channel dimension format for the output image. If `None`, the channel dimension format of the input | ||
| image is used. Can be one of: | ||
| - `ChannelDimension.FIRST`: image in (num_channels, height, width) format. | ||
| - `ChannelDimension.LAST`: image in (height, width, num_channels) format. | ||
| """ | ||
| return rescale(image=image, scale=scale, data_format=data_format, **kwargs) | ||
|
|
||
| def preprocess( | ||
| self, | ||
| images: Union["PIL.Image.Image", TensorType, List["PIL.Image.Image"], List[TensorType]], | ||
| do_resize: bool = None, | ||
| do_rescale: bool = None, | ||
| size_divisor: int = None, | ||
| resample: PIL.Image.Resampling = None, | ||
| return_tensors: Optional[Union[TensorType, str]] = None, | ||
| data_format: ChannelDimension = ChannelDimension.FIRST, | ||
| **kwargs | ||
| ) -> BatchFeature: | ||
| """ | ||
| Preprocess the given images. | ||
|
|
||
| Args: | ||
| images (`PIL.Image.Image` or `TensorType` or `List[np.ndarray]` or `List[TensorType]`): | ||
| The image or images to preprocess. | ||
| do_resize (`bool`, *optional*, defaults to `self.do_resize`): | ||
| Whether to resize the input such that the (height, width) dimensions are a multiple of `size_divisor`. | ||
| do_rescale (`bool`, *optional*, defaults to `self.do_rescale`): | ||
| Whether or not to apply the scaling factor (to make pixel values floats between 0. and 1.). | ||
| size_divisor (`int`, *optional*, defaults to `self.size_divisor`): | ||
| When `do_resize` is `True`, images are resized so their height and width are rounded down to the | ||
| closest multiple of `size_divisor`. | ||
| resample (`int`, *optional*, defaults to `self.resample`): | ||
| Resampling filter to use if resizing the image. This can be one of the enum `PIL.Image.Resampling`, | ||
| Only has an effect if `do_resize` is set to `True`. | ||
| return_tensors (`str`, *optional*, defaults to `None`): | ||
| The type of tensors to return. Can be one of: | ||
| - `None`: Return a list of `np.ndarray`. | ||
| - `TensorType.TENSORFLOW` or `'tf'`: Return a batch of type `tf.Tensor`. | ||
| - `TensorType.PYTORCH` or `'pt'`: Return a batch of type `torch.Tensor`. | ||
| - `TensorType.NUMPY` or `'np'`: Return a batch of type `np.ndarray`. | ||
| - `TensorType.JAX` or `'jax'`: Return a batch of type `jax.numpy.ndarray`. | ||
| data_format (`ChannelDimension`, *optional*, defaults to `ChannelDimension.FIRST`): | ||
| The channel dimension format for the output image. Can be one of: | ||
| - `ChannelDimension.FIRST`: image in (num_channels, height, width) format. | ||
| - `ChannelDimension.LAST`: image in (height, width, num_channels) format. | ||
| """ | ||
| do_resize = do_resize if do_resize is not None else self.do_resize | ||
| do_rescale = do_rescale if do_rescale is not None else self.do_rescale | ||
| size_divisor = size_divisor if size_divisor is not None else self.size_divisor | ||
| resample = resample if resample is not None else self.resample | ||
|
|
||
| if do_resize and size_divisor is None: | ||
| raise ValueError("size_divisor is required for resizing") | ||
|
|
||
| if not is_batched(images): | ||
| images = [images] | ||
|
|
||
| if not valid_images(images): | ||
| raise ValueError("Invalid image(s)") | ||
|
|
||
| # All transformations expect numpy arrays. | ||
| images = [to_numpy_array(img) for img in images] | ||
|
|
||
| if do_resize: | ||
| images = [self.resize(image, size_divisor=size_divisor, resample=resample) for image in images] | ||
|
|
||
| if do_rescale: | ||
| images = [self.rescale(image, scale=1 / 255) for image in images] | ||
|
|
||
| images = [to_channel_dimension_format(image, data_format) for image in images] | ||
|
|
||
| data = {"pixel_values": images} | ||
| return BatchFeature(data=data, tensor_type=return_tensors) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A quick question - are we going to default to "np" ? If so, maybe we can remove None from the accepted argument types or make it a non-optional argument
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It depends partly on whether we merge in: huggingface#18499
Defaulting to
"np"isn't necessary to be able to use different combinations of e.g.do_resizeanddo_normalize. As we're aliasing the previous feature extractors with the new image processors, change the default would still be a breaking change.If we decided to default to
"np", then we'd have to include additional checks on the processed images. At the moment, becauseresizeresizes the images to multiples ofsize_divisor, they are not guaranteed to all be the same size. This means calls theBatchFeaturewill fail if any of"np","tf", "pt"or"jax"` are passed in as the images can't be batched together.My preference would be to keep
return_tensors=Noneas this more closely matches the behaviour of our tokenizers. However, ourtokenizerprovides arguments such that batches can be created e.g.padding=True. Not sure if an equivalent makes sense here.What do you think? If we want to set "np" as default we should discuss how to handle introducing the image processors versus introducing that change.
cc @NielsRogge