Skip to content

[WIP] Uniformize processors in text+image multimodal models.#27768

Draft
molbap wants to merge 1 commit intohuggingface:mainfrom
molbap:refactor_text_image_processors
Draft

[WIP] Uniformize processors in text+image multimodal models.#27768
molbap wants to merge 1 commit intohuggingface:mainfrom
molbap:refactor_text_image_processors

Conversation

@molbap
Copy link
Contributor

@molbap molbap commented Nov 30, 2023

What does this PR do?

This PR is a work in progress aiming at uniformizing all text-image multimodal processors. Ideally, leveraging AutoProcessor(...) or an equivalent for every model would be the best.

The processor is one of the most fundamental blocks of transformers, and modifying it can only be done with careful deprecation cycles. It is however the opportunity to enforce a standard, design-wise, for future processing utilties and down-the-line pipeline integrations.

For instance align has a current __call__ method def __call__(self, text=None, images=None, padding="max_length", max_length=64, return_tensors=None, **kwargs)
altclip has __call__(self, text=None, images=None, return_tensors=None, **kwargs)
blip has

    def __call__(
        self,
        images: ImageInput = None,
        text: Union[TextInput, PreTokenizedInput, List[TextInput], List[PreTokenizedInput]] = None,
        add_special_tokens: bool = True,
        padding: Union[bool, str, PaddingStrategy] = False,
        truncation: Union[bool, str, TruncationStrategy] = None,
        max_length: Optional[int] = None,
        stride: int = 0,
        pad_to_multiple_of: Optional[int] = None,
        return_attention_mask: Optional[bool] = None,
        return_overflowing_tokens: bool = False,
        return_special_tokens_mask: bool = False,
        return_offsets_mapping: bool = False,
        return_token_type_ids: bool = False,
        return_length: bool = False,
        verbose: bool = True,
        return_tensors: Optional[Union[str, TensorType]] = None,
        **kwargs,
    ) -> BatchEncoding:

And so on, with recently for instance Kosmos-2

    def __call__(
        self,
        images: ImageInput = None,
        text: Union[TextInput, List[TextInput]] = None,
        bboxes: BboxInput = None,
        num_image_tokens: Optional[int] = 64,
        first_image_token_id: Optional[int] = None,
        add_special_tokens: bool = True,
        add_eos_token: bool = False,
        padding: Union[bool, str, PaddingStrategy] = False,
        truncation: Union[bool, str, TruncationStrategy] = None,
        max_length: Optional[int] = None,
        pad_to_multiple_of: Optional[int] = None,
        return_attention_mask: Optional[bool] = None,
        return_length: bool = False,
        verbose: bool = True,
        return_tensors: Optional[Union[str, TensorType]] = None,
        **kwargs,
    ) -> BatchFeature:

Currently, there are 30 text + image models that have a dedicated processing_<model> file. All should be reviewed and made pipeline-compatible. All of them have to be checked, modified or wrapped with a common class.

  • align
  • altclip
  • blip
  • blip_2
  • bridgetower
  • chinese_clip
  • clipseg
  • clip
  • donut
  • flava
  • fuyu
  • git
  • idefics
  • instructblip
  • kosmos2
  • layoutlmv2
  • layoutlmv3
  • layoutxlm
  • mgp_str
  • nougat
  • oneformer
  • owlv2
  • owlvit
  • perceiver
  • pix2struct
  • troc
  • tvp
  • vilt
  • vision_text_dual_encoder
  • x_clip

Related works:

Before submitting

@github-actions
Copy link
Contributor

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@LysandreJik
Copy link
Member

Still being worked on but a longer-term project; putting the WIP label so that the bot doesn't close it.

@LysandreJik LysandreJik added the WIP Label your PR/Issue with WIP for some long outstanding Issues/PRs that are work in progress label Dec 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

WIP Label your PR/Issue with WIP for some long outstanding Issues/PRs that are work in progress

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants