Add in-out modalities as class attribute per model #41366

zucchini-nlp · 2025-10-06T12:41:23Z

What does this PR do?

Branches out from #40884 (comment) to make review and merge faster

Adds a class attr for each model to indicate the supported input and output modalities. Out modalities will be None in case the model is not generative and "text" in most other cases. We have only a few models that can generate audio and image in the output. Note that for encoder decoder models that whisper the input modalities will contain both encoder ("audio") and decoder ("text") modalities

This will be used firstly for the pipeline and we can extend usage later to better testing suite and in preparing inputs better in generation with multimodal LLMs (e.g. if we move multimodal encoding to GenrationMixin._prepare_multimodal_encodings). No test added at this point, because there is nothing to test

HuggingFaceDocBuilderDev · 2025-10-06T12:57:17Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

gante

I wonder if we can automate these variables, instead of having to manually define them. E.g. can we look at the signature of forward and, based on arguments present / type hints, determine modalities?

(fewer manual flags = smaller odds of human error = fewer bugs)

src/transformers/models/aimv2/modeling_aimv2.py

src/transformers/modeling_utils.py

zucchini-nlp · 2025-10-06T15:41:19Z

E.g. can we look at the signature of forward and, based on arguments present / type hints, determine modalities?

yeah, also thought of it. It is doable for most models but there are some tricky ones as well, For example we don't have a consistent naming habit for video modality or we have no way to say what is being output by a model that has an overwritten generate(). We can have have a default to input_modalities as well similar to output_modalities, but then manually overwrite all models where the pattern does not match

gante · 2025-10-06T15:45:54Z

We can have have a default to input_modalities as well similar to output_modalities, but then manually overwrite all models where the pattern does not match

imo this would be an improvement :) also an incentive to nudge contributors towards standard names and definitions!

gante · 2025-10-06T15:46:29Z

But check with @ArthurZucker before committing code!

(See tenet number 4: standardization for model definitions, abstractions for infra. I would place this under infra)

adarshxs · 2025-10-09T17:01:26Z

I think this is a decent approach. waiting on this PR as it helps quite a lot - rather than depending on heuristics or some sort of registry we want to get the input-output modalities supported by the model via config. I hope we can adopt this as a standard soon @ArthurZucker

zucchini-nlp · 2025-10-10T10:37:28Z

@bot /style

github-actions · 2025-10-10T10:38:15Z

Style fix is beginning .... View the workflow run here.

ArthurZucker

Explicit and non automatic looks better for now IMO 😉

src/transformers/modeling_utils.py

github-actions · 2025-10-16T14:50:09Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: aimv2, align, altclip, aria, audio_spectrogram_transformer, autoformer, aya_vision, bark, beit, bit, blip, blip_2, blt, bridgetower, chameleon, chinese_clip

* update all models * fix copies * explanation comment * better notation in omni model * style * fix copies * output_modalities under generation mixin * fix copies * oh, glm4v also needs conversion

zucchini-nlp added 3 commits October 6, 2025 14:32

update all models

697a125

fix copies

b853eba

explanation comment

0fbe159

zucchini-nlp requested review from ArthurZucker and gante October 6, 2025 13:02

better notation in omni model

1466dfe

gante reviewed Oct 6, 2025

View reviewed changes

src/transformers/models/aimv2/modeling_aimv2.py Show resolved Hide resolved

src/transformers/modeling_utils.py Outdated Show resolved Hide resolved

Merge branch 'main' into add-in-out-modality-attributes

e65fe60

style

6ab98c0

ArthurZucker approved these changes Oct 16, 2025

View reviewed changes

src/transformers/modeling_utils.py Outdated Show resolved Hide resolved

zucchini-nlp added 5 commits October 16, 2025 15:22

merge main

015e687

fix copies

a06f853

output_modalities under generation mixin

3b01359

fix copies

81efb81

oh, glm4v also needs conversion

5d54d7e

zucchini-nlp merged commit 1c36d40 into huggingface:main Oct 16, 2025
22 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add in-out modalities as class attribute per model #41366

Add in-out modalities as class attribute per model #41366

Uh oh!

zucchini-nlp commented Oct 6, 2025 •

edited

Loading

Uh oh!

HuggingFaceDocBuilderDev commented Oct 6, 2025

Uh oh!

gante left a comment

Uh oh!

Uh oh!

Uh oh!

zucchini-nlp commented Oct 6, 2025

Uh oh!

gante commented Oct 6, 2025

Uh oh!

gante commented Oct 6, 2025 •

edited

Loading

Uh oh!

adarshxs commented Oct 9, 2025

Uh oh!

zucchini-nlp commented Oct 10, 2025

Uh oh!

github-actions bot commented Oct 10, 2025

Uh oh!

ArthurZucker left a comment

Uh oh!

Uh oh!

github-actions bot commented Oct 16, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Add in-out modalities as class attribute per model #41366

Add in-out modalities as class attribute per model #41366

Uh oh!

Conversation

zucchini-nlp commented Oct 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

HuggingFaceDocBuilderDev commented Oct 6, 2025

Uh oh!

gante left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

zucchini-nlp commented Oct 6, 2025

Uh oh!

gante commented Oct 6, 2025

Uh oh!

gante commented Oct 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

adarshxs commented Oct 9, 2025

Uh oh!

zucchini-nlp commented Oct 10, 2025

Uh oh!

github-actions bot commented Oct 10, 2025

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions bot commented Oct 16, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

zucchini-nlp commented Oct 6, 2025 •

edited

Loading

gante commented Oct 6, 2025 •

edited

Loading