Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
62 commits
Select commit Hold shift + click to select a range
54aed8b
Base processor skeleton
amyeroberts Jul 27, 2022
ba55c89
BatchFeature for packaging image processor outputs
amyeroberts Jul 27, 2022
4b430d4
Initial image processor for GLPN
amyeroberts Jul 27, 2022
b1c8b59
REmove accidental import
amyeroberts Jul 27, 2022
b9ce4a0
Import BatchFeature from feature_extraction_utils
amyeroberts Jul 28, 2022
6b678fb
Fixup and docs
amyeroberts Jul 28, 2022
db93437
Fixup and docs
amyeroberts Jul 28, 2022
bd890d5
Fixup and docs
amyeroberts Jul 28, 2022
4b27a34
Fixup and docs
amyeroberts Jul 28, 2022
ff0d49e
BatchFeature for packaging image processor outputs
amyeroberts Jul 27, 2022
2c2fa9a
Import BatchFeature from feature_extraction_utils
amyeroberts Jul 28, 2022
b9f7837
Merge branch 'image-processor-mixin' into base-image-processor-class
amyeroberts Jul 28, 2022
346270d
Resolve conflicts
amyeroberts Jul 28, 2022
7faf2e6
Import BatchFeature from feature_extraction_utils
amyeroberts Jul 28, 2022
ccc15fb
Fixup and docs
amyeroberts Jul 28, 2022
c8f8eb6
Fixup and docs
amyeroberts Jul 28, 2022
90093f4
BatchFeature for packaging image processor outputs
amyeroberts Jul 27, 2022
d89c051
Import BatchFeature from feature_extraction_utils
amyeroberts Jul 28, 2022
9bc9157
Fixup and docs
amyeroberts Jul 28, 2022
6ec382a
Mixin for saving the image processor
amyeroberts Jul 27, 2022
56ee6ad
Fixup and docs
amyeroberts Jul 28, 2022
38ebb50
Merge branch 'image-batch-feature' into image-processor-glpn
amyeroberts Jul 28, 2022
6b88d5f
Add rescale back and remove ImageType
amyeroberts Jul 28, 2022
67077f1
fix import mistake
amyeroberts Jul 28, 2022
fb6438c
Merge branch 'image-processor-mixin' into base-image-processor-class
amyeroberts Jul 28, 2022
ffe71b6
Merge branch 'base-image-processor-class' into image-batch-feature
amyeroberts Jul 28, 2022
cc480e8
Merge branch 'image-batch-feature' into image-processor-glpn
amyeroberts Jul 28, 2022
4264d1a
Merge branch 'image-processor-mixin' into base-image-processor-class
amyeroberts Jul 28, 2022
fb5dcd6
Merge in branch and remove conflicts
amyeroberts Jul 28, 2022
43f561d
Add in rescaling
amyeroberts Jul 29, 2022
60c56e5
Data format flag for rescale
amyeroberts Jul 29, 2022
9294dbc
Fix typo
amyeroberts Jul 29, 2022
936de65
Merge branch 'image-processor-mixin' into base-image-processor-class
amyeroberts Jul 29, 2022
627c048
Merge branch 'base-image-processor-class' into image-batch-feature
amyeroberts Jul 29, 2022
1b64c80
Merge branch 'image-batch-feature' into image-processor-glpn
amyeroberts Jul 29, 2022
88b82e9
Fixes to make IP and FE outputs match
amyeroberts Jul 29, 2022
2117b94
Merge branch 'image-processor-mixin' into base-image-processor-class
amyeroberts Aug 2, 2022
68de952
Resole merge conflicts
amyeroberts Aug 2, 2022
5208680
Merge branch 'image-batch-feature' into image-processor-glpn
amyeroberts Aug 2, 2022
9514d54
Merge branch 'image-processor-mixin' into base-image-processor-class
amyeroberts Aug 2, 2022
8f63b76
Merge branch 'base-image-processor-class' into image-batch-feature
amyeroberts Aug 2, 2022
46a9c74
Merge branch 'image-batch-feature' into image-processor-glpn
amyeroberts Aug 2, 2022
082e4ff
Remove default to numpy batching
amyeroberts Aug 2, 2022
bf73358
Fix up
amyeroberts Aug 3, 2022
34b6b2f
Add docstring and model_input_types
amyeroberts Aug 4, 2022
8678c13
Merge branch 'image-processor-mixin' into base-image-processor-class
amyeroberts Aug 4, 2022
937884c
Merge branch 'base-image-processor-class' into image-batch-feature
amyeroberts Aug 4, 2022
a1b681a
Merge branch 'image-batch-feature' into image-processor-glpn
amyeroberts Aug 4, 2022
952c2a0
Resolve merge conflicts
amyeroberts Aug 5, 2022
2f0fa0b
Resolve merge conflicts
amyeroberts Aug 5, 2022
e6233cc
Resolve merge conflicts
amyeroberts Aug 5, 2022
bd0afd6
Merge branch 'image-processor-mixin' into base-image-processor-class
amyeroberts Aug 5, 2022
a6f69bc
Merge branch 'base-image-processor-class' into image-batch-feature
amyeroberts Aug 5, 2022
a7af81f
Merge and resolve conflicts
amyeroberts Aug 5, 2022
b66d0f6
Merge branch 'image-processor-mixin' into base-image-processor-class
amyeroberts Aug 7, 2022
8b73f89
Merge branch 'base-image-processor-class' into image-batch-feature
amyeroberts Aug 7, 2022
ae6030c
Merge branch 'image-batch-feature' into image-processor-glpn
amyeroberts Aug 7, 2022
7a4d22a
Fix up
amyeroberts Aug 8, 2022
790c2c6
Apply suggestions from code review
amyeroberts Aug 10, 2022
2e929cf
Update src/transformers/image_transforms.py
amyeroberts Aug 12, 2022
ae35873
Add in docstrings
amyeroberts Aug 17, 2022
4fff267
Merge pull request #23 from amyeroberts/image-processor-glpn
amyeroberts Aug 17, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion docs/source/en/internal/image_processing_utils.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -19,10 +19,13 @@ Most of those are only useful if you are studying the code of the image processo

## Image Transformations

[[autodoc]] image_transforms.to_pil_image
[[autodoc]] image_transforms.rescale

[[autodoc]] image_transforms.resize

[[autodoc]] image_transforms.to_pil_image



## ImageProcessorMixin

Expand Down
4 changes: 2 additions & 2 deletions src/transformers/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -635,7 +635,7 @@
]
else:
_import_structure["image_processing_utils"] = ["ImageProcessorMixin"]
_import_structure["image_transforms"] = ["resize", "to_pil_image"]
_import_structure["image_transforms"] = ["rescale", "resize", "to_pil_image"]
_import_structure["image_utils"] = ["ImageFeatureExtractionMixin"]
_import_structure["models.beit"].append("BeitFeatureExtractor")
_import_structure["models.clip"].append("CLIPFeatureExtractor")
Expand Down Expand Up @@ -3372,7 +3372,7 @@
from .utils.dummy_vision_objects import *
else:
from .image_processing_utils import ImageProcessorMixin
from .image_transforms import resize, to_pil_image
from .image_transforms import rescale, resize, to_pil_image
from .image_utils import ImageFeatureExtractionMixin
from .models.beit import BeitFeatureExtractor
from .models.clip import CLIPFeatureExtractor, CLIPProcessor
Expand Down
30 changes: 30 additions & 0 deletions src/transformers/image_processing_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,13 +13,43 @@
# See the License for the specific language governing permissions and
# limitations under the License.

from .feature_extraction_utils import BatchFeature as BaseBatchFeature
from .feature_extraction_utils import FeatureExtractionMixin
from .utils import logging


logger = logging.get_logger(__name__)


# TODO: Move BatchFeature to be imported by both feature_extraction_utils and image_processing_utils
# We override the class string here, but logic is the same.
class BatchFeature(BaseBatchFeature):
r"""
Holds the output of the image processor specific `__call__` methods.

This class is derived from a python dictionary and can be used as a dictionary.

Args:
data (`dict`):
Dictionary of lists/arrays/tensors returned by the __call__/pad methods ('pixel_values', 'attention_mask',
etc.).
tensor_type (`Union[None, str, TensorType]`, *optional*):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A quick question - are we going to default to "np" ? If so, maybe we can remove None from the accepted argument types or make it a non-optional argument

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It depends partly on whether we merge in: huggingface#18499

Defaulting to "np" isn't necessary to be able to use different combinations of e.g. do_resize and do_normalize. As we're aliasing the previous feature extractors with the new image processors, change the default would still be a breaking change.

If we decided to default to "np", then we'd have to include additional checks on the processed images. At the moment, because resize resizes the images to multiples of size_divisor, they are not guaranteed to all be the same size. This means calls the BatchFeature will fail if any of "np", "tf", "pt"or"jax"` are passed in as the images can't be batched together.

My preference would be to keep return_tensors=None as this more closely matches the behaviour of our tokenizers. However, our tokenizer provides arguments such that batches can be created e.g. padding=True. Not sure if an equivalent makes sense here.

What do you think? If we want to set "np" as default we should discuss how to handle introducing the image processors versus introducing that change.
cc @NielsRogge

You can give a tensor_type here to convert the lists of integers in PyTorch/TensorFlow/Numpy Tensors at
initialization.
"""


# We use aliasing whilst we phase out the old API. Once feature extractors for vision models
# are deprecated, ImageProcessor mixin will be implemented. Any shared logic will be abstracted out.
ImageProcessorMixin = FeatureExtractionMixin


class BaseImageProcessor(ImageProcessorMixin):
def __init__(self, **kwargs):
super().__init__(**kwargs)

def __call__(self, images, **kwargs) -> BatchFeature:
return self.preprocess(images, **kwargs)

def preprocess(self, images, **kwargs) -> BatchFeature:
raise NotImplementedError("Each image processor must implement its own preprocess method")
42 changes: 34 additions & 8 deletions src/transformers/image_transforms.py
Original file line number Diff line number Diff line change
Expand Up @@ -64,15 +64,42 @@ def to_channel_dimension_format(image: np.ndarray, channel_dim: Union[ChannelDim
raise ValueError("Unsupported channel dimension format: {}".format(channel_dim))


def rescale(
image: np.ndarray, scale: Union[float, int] = 255, data_format: Optional[ChannelDimension] = None, dtype=np.float32
) -> np.ndarray:
"""
Rescales `image` by `scale`.

Args:
image (`np.ndarray`):
The image to rescale.
scale (`float` or `int`, *optional*, defaults to 255):
The scale to use for rescaling the image.
data_format (`ChannelDimension`, *optional*):
The channel dimension format of the image. If not provided, it will be the same as the input image.
dtype (`np.dtype`, *optional*, defaults to `np.float32`):
The dtype of the output image. Defaults to `np.float32`. Used for backwards compatibility with feature
extractors.

Returns:
image: A rescaled np.ndarray image.
"""
rescaled_image = image * scale
if data_format is not None:
rescaled_image = to_channel_dimension_format(rescaled_image, data_format)
rescaled_image = rescaled_image.astype(dtype)
return rescaled_image


def to_pil_image(
image: Union[np.ndarray, PIL.Image.Image, "torch.Tensor", "tf.Tensor"], rescale=None
image: Union[np.ndarray, PIL.Image.Image, "torch.Tensor", "tf.Tensor", "jnp.Tensor"], do_rescale=None
) -> PIL.Image.Image:
"""
Converts `image` to a PIL Image. Optionally rescales it and puts the channel dimension back as the last axis if
needed.

Args:
image (`PIL.Image.Image` or `numpy.ndarray` or `torch.Tensor`):
image (`PIL.Image.Image`, `numpy.ndarray`, `torch.Tensor`, `tf.Tensor`):
The image to convert to the PIL Image format.
rescale (`bool`, *optional*):
Whether or not to apply the scaling factor (to make pixel values integers between 0 and 255). Will default
Expand All @@ -87,15 +114,15 @@ def to_pil_image(
image = np.array(image)

if not isinstance(image, np.ndarray):
raise ValueError("Input image must be of type PIL.Image.Image, numpy.ndarray or torch.Tensor")
raise ValueError("Input image type not supported: {}".format(type(image)))

# If the channel as been moved to first dim, we put it back at the end.
image = to_channel_dimension_format(image, ChannelDimension.LAST)

# PIL.Image can only store uint8 values, so we rescale the image to be between 0 and 255 if needed.
rescale = isinstance(image.flat[0], float) if rescale is None else rescale
if rescale:
rescale = image * 255
do_rescale = isinstance(image.flat[0], float) if do_rescale is None else do_rescale
if do_rescale:
image = rescale(image, 255)
image = image.astype(np.uint8)
return PIL.Image.fromarray(image)

Expand Down Expand Up @@ -186,8 +213,7 @@ def resize(
data_format (`ChannelDimension`, *optional*, defaults to `None`):
The channel dimension format of the output image. If `None`, will use the inferred format from the input.
return_numpy (`bool`, *optional*, defaults to `True`):
Whether or not to return the resized image as a numpy array. If False a PIL.Image.Image object is
returned.
Whether or not to return the resized image as a numpy array. If False a PIL.Image.Image object is returned.

Returns:
image: A resized np.ndarray.
Expand Down
33 changes: 29 additions & 4 deletions src/transformers/image_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,14 +29,19 @@
IMAGENET_STANDARD_MEAN,
IMAGENET_STANDARD_STD,
)
from .utils.generic import ExplicitEnum, _is_jax, _is_tensorflow, _is_torch
from .utils.generic import ExplicitEnum, _is_jax, _is_tensorflow, _is_torch, to_numpy


ImageInput = Union[
PIL.Image.Image, np.ndarray, "torch.Tensor", List[PIL.Image.Image], List[np.ndarray], List["torch.Tensor"] # noqa
]


class ChannelDimension(ExplicitEnum):
FIRST = "channels_first"
LAST = "channels_last"


def is_torch_tensor(obj):
return _is_torch(obj) if is_torch_available() else False

Expand All @@ -49,9 +54,29 @@ def is_jax_tensor(obj):
return _is_jax(obj) if is_flax_available() else False


class ChannelDimension(ExplicitEnum):
FIRST = "channels_first"
LAST = "channels_last"
def is_valid_image(img):
return (
isinstance(img, (PIL.Image.Image, np.ndarray))
or is_torch_tensor(img)
or is_tf_tensor(img)
or is_jax_tensor(img)
)


def valid_images(imgs):
return all(is_valid_image(img) for img in imgs)


def is_batched(img):
if isinstance(img, (list, tuple)):
return is_valid_image(img[0])
return False


def to_numpy_array(img) -> np.ndarray:
if isinstance(img, PIL.Image.Image):
return np.array(img)
return to_numpy(img)


def infer_channel_dimension_format(image: np.ndarray) -> ChannelDimension:
Expand Down
180 changes: 180 additions & 0 deletions src/transformers/models/glpn/image_processing_glpn.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,180 @@
# coding=utf-8
# Copyright 2022 The HuggingFace Inc. team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Image processor class for GLPN."""

from typing import List, Optional, Union

import numpy as np
import PIL.Image

from transformers.utils.generic import TensorType

from ...image_processing_utils import BaseImageProcessor, BatchFeature
from ...image_transforms import rescale, resize, to_channel_dimension_format
from ...image_utils import ChannelDimension, get_image_size, is_batched, to_numpy_array, valid_images
from ...utils import logging


logger = logging.get_logger(__name__)


class GLPNImageProcessor(BaseImageProcessor):
r"""
Constructs a GLPN image processor.

Args:
do_resize (`bool`, *optional*, defaults to `True`):
Set the class default for the `do_resize` parameter. Controls whether to resize the image's (height, width)
dimensions, rounding them down to the closest multiple of `size_divisor`.
do_rescale (`bool`, *optional*, defaults to `True`):
Set the class default for the `do_rescale` parameter. Controls whether or not to apply the scaling factor
(to make pixel values floats between 0. and 1.).
size_divisor (`int`, *optional*, defaults to 32):
Set the class default for the `size_divisor` parameter. When `do_resize` is `True`, images are resized so
their height and width are rounded down to the closest multiple of `size_divisor`.
resample (`PIL.Image.Resampling`, *optional*, defaults to `PIL.Image.Resampling.BILINEAR`):
Set the class default for `resample`. Defines the resampling filter to use if resizing the image.
"""

model_input_names = ["pixel_values"]

def __init__(
self, do_resize=True, do_rescale=True, size_divisor=32, resample=PIL.Image.Resampling.BILINEAR, **kwargs
) -> None:
self.do_resize = do_resize
self.do_rescale = do_rescale
self.size_divisor = size_divisor
self.resample = resample
super().__init__(**kwargs)

def resize(
self,
image: np.ndarray,
size_divisor: int,
resample: PIL.Image.Resampling,
data_format: Optional[ChannelDimension] = None,
**kwargs
) -> np.ndarray:
"""
Resize the image, rounding the (height, width) dimensions down to the closest multiple of size_divisor.

If the image is of dimension (3, 260, 170) and size_divisor is 32, the image will be resized to (3, 256, 160).

Args:
image (`np.ndarray`):
The image to resize.
size_divisor (`int`):
The image is resized so its height and width are rounded down to the closest multiple of
`size_divisor`.
resample (`PIL.Image.Resampling`):
Resampling filter to use when resizing the image.
data_format (`ChannelDimension`, *optional*):
The channel dimension format for the output image. If `None`, the channel dimension format of the input
image is used. Can be one of:
- `ChannelDimension.FIRST`: image in (num_channels, height, width) format.
- `ChannelDimension.LAST`: image in (height, width, num_channels) format.
"""
height, width = get_image_size(image)
# Rounds the height and width down to the closest multiple of size_divisor
new_h = height // size_divisor * size_divisor
new_w = width // size_divisor * size_divisor
image = resize(image, (new_h, new_w), resample=resample, data_format=data_format, **kwargs)
return image

def rescale(
self, image: np.ndarray, scale: Union[int, float], data_format: Optional[ChannelDimension] = None, **kwargs
) -> np.ndarray:
"""
Rescale the image by the given scaling factor `scale`.

Args:
image (`np.ndarray`):
The image to rescale.
scale (`int` or `float`):
The scaling factor to rescale pixel values by.
data_format (`ChannelDimension`, *optional*):
The channel dimension format for the output image. If `None`, the channel dimension format of the input
image is used. Can be one of:
- `ChannelDimension.FIRST`: image in (num_channels, height, width) format.
- `ChannelDimension.LAST`: image in (height, width, num_channels) format.
"""
return rescale(image=image, scale=scale, data_format=data_format, **kwargs)

def preprocess(
self,
images: Union["PIL.Image.Image", TensorType, List["PIL.Image.Image"], List[TensorType]],
do_resize: bool = None,
do_rescale: bool = None,
size_divisor: int = None,
resample: PIL.Image.Resampling = None,
return_tensors: Optional[Union[TensorType, str]] = None,
data_format: ChannelDimension = ChannelDimension.FIRST,
**kwargs
) -> BatchFeature:
"""
Preprocess the given images.

Args:
images (`PIL.Image.Image` or `TensorType` or `List[np.ndarray]` or `List[TensorType]`):
The image or images to preprocess.
do_resize (`bool`, *optional*, defaults to `self.do_resize`):
Whether to resize the input such that the (height, width) dimensions are a multiple of `size_divisor`.
do_rescale (`bool`, *optional*, defaults to `self.do_rescale`):
Whether or not to apply the scaling factor (to make pixel values floats between 0. and 1.).
size_divisor (`int`, *optional*, defaults to `self.size_divisor`):
When `do_resize` is `True`, images are resized so their height and width are rounded down to the
closest multiple of `size_divisor`.
resample (`int`, *optional*, defaults to `self.resample`):
Resampling filter to use if resizing the image. This can be one of the enum `PIL.Image.Resampling`,
Only has an effect if `do_resize` is set to `True`.
return_tensors (`str`, *optional*, defaults to `None`):
The type of tensors to return. Can be one of:
- `None`: Return a list of `np.ndarray`.
- `TensorType.TENSORFLOW` or `'tf'`: Return a batch of type `tf.Tensor`.
- `TensorType.PYTORCH` or `'pt'`: Return a batch of type `torch.Tensor`.
- `TensorType.NUMPY` or `'np'`: Return a batch of type `np.ndarray`.
- `TensorType.JAX` or `'jax'`: Return a batch of type `jax.numpy.ndarray`.
data_format (`ChannelDimension`, *optional*, defaults to `ChannelDimension.FIRST`):
The channel dimension format for the output image. Can be one of:
- `ChannelDimension.FIRST`: image in (num_channels, height, width) format.
- `ChannelDimension.LAST`: image in (height, width, num_channels) format.
"""
do_resize = do_resize if do_resize is not None else self.do_resize
do_rescale = do_rescale if do_rescale is not None else self.do_rescale
size_divisor = size_divisor if size_divisor is not None else self.size_divisor
resample = resample if resample is not None else self.resample

if do_resize and size_divisor is None:
raise ValueError("size_divisor is required for resizing")

if not is_batched(images):
images = [images]

if not valid_images(images):
raise ValueError("Invalid image(s)")

# All transformations expect numpy arrays.
images = [to_numpy_array(img) for img in images]

if do_resize:
images = [self.resize(image, size_divisor=size_divisor, resample=resample) for image in images]

if do_rescale:
images = [self.rescale(image, scale=1 / 255) for image in images]

images = [to_channel_dimension_format(image, data_format) for image in images]

data = {"pixel_values": images}
return BatchFeature(data=data, tensor_type=return_tensors)
4 changes: 4 additions & 0 deletions src/transformers/utils/dummy_vision_objects.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,10 @@ def __init__(self, *args, **kwargs):
requires_backends(self, ["vision"])


def rescale(*args, **kwargs):
requires_backends(rescale, ["vision"])


def resize(*args, **kwargs):
requires_backends(resize, ["vision"])

Expand Down