[WIP] Improve multimodal processors - rely less on kwargs#28711
[WIP] Improve multimodal processors - rely less on kwargs#28711molbap wants to merge 36 commits intohuggingface:mainfrom
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
amyeroberts
left a comment
There was a problem hiding this comment.
Nice - looking good!
Let me know when you want another review 🤗
| do_rescale: bool = True, | ||
| rescale_factor: Union[int, float] = 1 / 255, | ||
| do_normalize: bool = True, | ||
| do_center_crop: bool = True, |
There was a problem hiding this comment.
What's the reason for changing the default here? I think BridgeTowerImageProcessor defaults to this being True, so would have this value, even if not passed here
There was a problem hiding this comment.
Yes, was conflicted on this... this is the current version of preprocess for bridgetower in main, missing a new variable declaration:
In that case, the
do_center_crop that was used in preprocess was the preprocess default, i.e. None instead of whatever was in the __init__, right?
There was a problem hiding this comment.
@amyeroberts on that, bridgetower did default do_center_cropto being True, but preprocess was not capturing it (it was a bug). The get_expected_values() method does not include mentions on center_crop and will crate expected uncropped values. From that
- Either change
...expected_values...getter to something that includes cropping in logic - change default of tester to match previous behaviour
| if len(args) > 0: | ||
| images = args[0] | ||
| args = args[1:] |
There was a problem hiding this comment.
It's very nice to see this go :)
| verbose=verbose, | ||
| ) | ||
| # add pixel_values + pixel_mask | ||
| print(size) |
There was a problem hiding this comment.
| print(size) |
| valid_processor_keys = { | ||
| "images", | ||
| "do_resize", | ||
| "size", | ||
| "size_divisor", | ||
| "resample", | ||
| "do_rescale", | ||
| "rescale_factor", | ||
| "do_normalize", | ||
| "image_mean", | ||
| "image_std", | ||
| "do_pad", | ||
| "do_center_crop", | ||
| "return_tensors", | ||
| "data_format", | ||
| "input_data_format", | ||
| "pad_and_return_pixel_mask", | ||
| } | ||
|
|
||
| unused_keys = set(kwargs.keys()) - valid_processor_keys | ||
| if unused_keys: | ||
| unused_key_str = ", ".join(unused_keys) | ||
| logger.info(f"Unused or unrecognized configuration parameters: {unused_key_str}.") |
There was a problem hiding this comment.
Two comments:
- What's the reason for this being added for some image processors but not others e.g. donut also accepts kwargs?
- Could we abstract out this check to something similar with Abstract image processor arg checks. #28843 using either inspect or an explicit class attribute?
There was a problem hiding this comment.
Good points!
- no reason at all - all for adding a similar logic to all!
inspectis a bit slow, right? But yes, also +1 to abstracting this away, I'll move it to another PR
There was a problem hiding this comment.
@amyeroberts was thinking about decorators: wdyt about having this as a wrapper/decorator in image_utils or elsewhere in utils?
valid_processor_keys = inspect.getfullargspec(self.preprocess)[0]
unused_keys = set(kwargs.keys()) - valid_processor_keys
if unused_keys:
unused_key_str = ", ".join(unused_keys)
logger.info(f"Unused or unrecognized configuration parameters: {unused_key_str}.")There was a problem hiding this comment.
LGTM!
Perhaps instead of inspecting, we could have a class attribute with the valid_processor_keys? e.g. something like this:
class FooImageProcessor:
_valid_processor_keys = ['bar', 'baz']
...
def preprocess(..., kwargs):
validate_kwargs(self._valid_processor_keys, kwargs)
Having a class attribute like this is something I think I'm going to end up with in the #28847 design
There was a problem hiding this comment.
sounds good yes! self-contained classes seem worth losing the decorator
| ) -> None: | ||
| if "pad_and_return_pixel_mask" in kwargs: | ||
| do_pad = kwargs.pop("pad_and_return_pixel_mask") | ||
| if pad_and_return_pixel_mask: |
There was a problem hiding this comment.
Let's set the default value for size in the init to avoid having mutables as default arguments
| if pad_and_return_pixel_mask: | |
| size = {"shortest_edge": 288} if size is None else size | |
| if pad_and_return_pixel_mask: |
| text_encoding = None | ||
|
|
||
| if text is not None: | ||
| self.current_processor = self.tokenizer |
There was a problem hiding this comment.
Current processor behaviour is deprecated, so we don't need to set it here. In fact, we should probably create a current_processor property which shows a deprecation message when used
| self.current_processor = self.tokenizer |
There was a problem hiding this comment.
Okay, that's good to know! Seen it in another instance I think, I'll drop it in that case and add the message
There was a problem hiding this comment.
Yes, it's definitely still around in places. For context, we used to have a behaviour when the current_processor was selected through a context manager. The context manager behaviour was removed, but there's still remnants of this, even though current_processor normally has no effect.
| text_encoding = None | ||
|
|
||
| if text is not None: | ||
| self.current_processor = self.tokenizer |
There was a problem hiding this comment.
Same comment here about current processors
| do_center_crop: Optional[bool] = True, | ||
| do_pad: Optional[bool] = True, |
There was a problem hiding this comment.
These shouldn't be optional in the init
| do_center_crop: Optional[bool] = True, | |
| do_pad: Optional[bool] = True, | |
| do_center_crop: bool = True, | |
| do_pad: bool = True, |
There was a problem hiding this comment.
absolutely, fixed in #28843 which should be merged before that one
What does this PR do?
This PR aims at a better control on the logic flow through
Processorclasses, in particular those leveragingImageProcessorwith aTokenizer. Linked with #27768.ImageProcessorscompared toNougat(as a reference point) have different signatures in their preprocess. One can list them hereThis helps standardize a bit in the first place, and then, will allow uniformizing
Processors.Fixes # (issue)
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Models: