-
Notifications
You must be signed in to change notification settings - Fork 32.3k
[BC] Update get_(text|image|audio|video)_features methods to return BaseModelOutputWithPooling
#42564
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
ArthurZucker
merged 143 commits into
huggingface:main
from
tomaarsen:feat/normalize_get_features_methods
Jan 23, 2026
Merged
[BC] Update get_(text|image|audio|video)_features methods to return BaseModelOutputWithPooling
#42564
Changes from all commits
Commits
Show all changes
143 commits
Select commit
Hold shift + click to select a range
4c65977
Add return_dict to get_text_features methods to allow returning 'Base…
tomaarsen 47c2418
Add return_dict to get_image_features methods to allow returning 'Bas…
tomaarsen b6d6df3
make fixup
tomaarsen aa51419
Ignore discrepancies for pooler_output, focus on last_hidden_state
tomaarsen 278b068
Update get_image_features for the missing architectures
tomaarsen 3b14045
Update all get_audio_features
tomaarsen b7e0d66
Update get_video_features, except instructblipvideo
tomaarsen 41bcca8
Merge branch 'main' into feat/normalize_get_features_methods
tomaarsen 7eb89b6
Run ruff formatting
tomaarsen 57af63d
Patch Glm4v VisionModel forward with BaseModelOutputWithPooling
tomaarsen 7285187
Patch instructblip, although backwards incompatibility stands
tomaarsen fd7be52
Patch Kosmos2 and Ovis2
tomaarsen 3f183fd
Reformat Ovis2
tomaarsen 391aac9
Avoid now-deprecated return_attentions
tomaarsen f8c887f
Remove NumFrames
tomaarsen 9a251ce
Proposal to simplify get_..._features via TransformersKwargs & check_…
tomaarsen 858d9d4
Revert check_model_inputs, adopt can_return_tuple, accept BC on get_.…
tomaarsen 2a64303
Fix typo: can_return_dict -> can_return_tuple
tomaarsen fc8ee93
Adopt can_return_tuple for many get_image_features
tomaarsen 00aa0f5
Update all get_audio_features, some edge cases handled (e.g. gemma3n)
tomaarsen 1ccbf5a
Update most get_video_features, some edge case remain, e.g. instruct…
tomaarsen 78fa904
Patch Fuyu, just return BaseModelOutputWithPooling without pooler
tomaarsen f082a8e
Introduce ModelOutput subclass for Chameleon, patch get_image_features
tomaarsen 9ddd3b4
Update modeling files with new output formats for get_..._features
tomaarsen 006b2a5
Update fast_vlm modeling forward from modular llava to remove image_s…
tomaarsen afd5e64
Merge branch 'main' into feat/normalize_get_features_methods
tomaarsen 1d6639b
Update colqwen2 its self.vlm.model.visual call to expect BaseModelOutput
tomaarsen d52def3
Replace prior return_dict with check_model_inputs on qwen2_5_vl its V…
tomaarsen ff67663
Use BaseModelOutputWithProjectionAttentions for Kosmos2 to allow retu…
tomaarsen 22522c4
Update Emu akin to Chameleon
tomaarsen 37a53c3
Update the blip architectures with a naive fix
tomaarsen 440914b
Convert remaining modulars (emu3, janus), patch emu3
tomaarsen b6dbddd
Merge branch 'main' into feat/normalize_get_features_methods
tomaarsen 48353a5
Patch blip test
tomaarsen 531321c
Update deepseek_vl using a new BaseModelOutputWithHighResVisionEncodings
tomaarsen 70577d2
Remove 'copied' for blip_2, instructblip and kosmos2 as they required…
tomaarsen f6f90d6
Patch qwen3_vl and qwen3_vl_moe, where I used last_hidden_state inste…
tomaarsen 7af0b66
Run repo-consistency
tomaarsen 8db6370
Merge branch 'main' into feat/normalize_get_features_methods
tomaarsen cbe007b
Use kwargs["output_hidden_states"] = True to hardcode output_hidden_s…
tomaarsen 7c34c6e
Update new GlmAsr get_audio_features on ForConditionalGeneration
tomaarsen d9edd99
Run make style
tomaarsen 763ddf6
Try to add _can_record_outputs to florence2
tomaarsen 8420640
Override JanusVisionModel.forward to avoid bad q-former copy from Blip2
tomaarsen e0ea300
Import missing BaseModelOutput
tomaarsen 78bd0d0
Pop deprecated 'return_attentions', setting 'return_dict' won't be us…
tomaarsen d348d93
Reintroduce kwargs filtering in llava etc. for safety re. image_sizes
tomaarsen 71ea85a
Use BaseModelOutputWithPooling superclass consistently for custom get…
tomaarsen 8c59e95
Update Blip-2 family and its BaseModelOutputWithVisionQformerOutputs
tomaarsen 3fff252
Merge branch 'main' into feat/normalize_get_features_methods
tomaarsen 3f4c34b
Update glm4v _can_record_outputs
tomaarsen b39b6d1
Remove check_model_inputs in granite_speech
tomaarsen af0ccb1
Run make style
tomaarsen f8e08d9
Add _can_record_outputs to Ovis2VisionModel
tomaarsen 2d747d9
Update get_text_features/get_video_features from pe_video
tomaarsen 008e15d
Update missing case on sam3
tomaarsen e92efb9
Update get_text_features type hints to Union[tuple, BaseModelOutputWi…
tomaarsen b06a2d2
Add _can_record_inputs to qwen2_5_omni and qwen2_5_vl
tomaarsen 4a573af
Update get_image_features and get_video_features on ernie4_5_vl_moe
tomaarsen 2c677f9
Update get_image_features type hints to Union[tuple, BaseModelOutputW…
tomaarsen 1a8d14b
Remove @auto_docstring from pe_video, it's seemingly not used on that…
tomaarsen 87d22d3
Update get_video_features type hints to Union[tuple, BaseModelOutputW…
tomaarsen 8d5802e
Fix pe_video import issue
tomaarsen a9ff924
Update forward, test, and docstring for sam3
tomaarsen 8ad35e7
Update get_audio_features type hints to Union[tuple, BaseModelOutputW…
tomaarsen 7c99867
Add simple test case for get_text_features
tomaarsen 35feb85
First attempt to get get_image_features under test, still 26 failures
tomaarsen a64634b
Resolve several test failures, progress still slow and inconsistent
tomaarsen b5b334f
Merge branch 'main' into feat/normalize_get_features_methods
tomaarsen 5ad8ca5
Split up get_..._features tests more, should be simpler to disable/cu…
tomaarsen 0284715
Fix emu3 tests, also track non-temporal ResNet in hidden_states
tomaarsen be41c04
Patch chameleon, emu3, ernie4_5, janus
tomaarsen 2743053
Skip output_attentions for FastVLM, timm doesn't accept it
tomaarsen 76371d8
Patch groupvit, instructblip, ovis2
tomaarsen 88a5804
Patch paddleocr_vl, qwen2_5_omni, qwen2_5_vl, qwen2_vl, and skip test…
tomaarsen 13875af
Patch qwen3_omni_moe, sam family, edgetam
tomaarsen e480bc0
Kill now unused BaseModelOutputWithFeatureMaps
tomaarsen 2bd9a49
Remove left-over return_dict from prior attempt
tomaarsen 5455038
Allow for output_hidden_states in theory, but skip impossible tests
tomaarsen 3f75c03
Introduce tests for get_audio_features, fixed all architectures
tomaarsen 5e7d821
Introduce tests for get_video_features, only ernie4_5_vl_moe is failing
tomaarsen 1b8ab38
Call post_init on GraniteSpeechCTCEncoder, which was given a PreTrain…
tomaarsen 3467798
Update llava_onevision test suite, only create video pixel_values in …
tomaarsen 6f23bf5
Create custom video input for ernie4_5_vl_moe
tomaarsen a8e5f92
Skip CLIP family tests; they don't support output_hidden_states/outpu…
tomaarsen 508955e
Breaking: update Blip2Model.get_text_features to no longer output logits
tomaarsen df4d751
Satisfy test_num_layers_is_small test for align
tomaarsen 1254b29
Test against last_hidden_state against batch_size and hidden_size
tomaarsen c8b712f
Skip last_hidden_state shape tests for unusual cases
tomaarsen d6f0fb9
Update docstrings via auto_docstring for all get_..._features methods
tomaarsen 51638d6
Ensure all auto_doc arguments are documented
tomaarsen af3b70f
Remove redundant docstrings
tomaarsen 4d522c7
Merge branch 'main' into feat/normalize_get_features_methods
tomaarsen 3564045
Also patch the new glm_image for get_image_features/output_hidden_states
tomaarsen f7100d3
Update modular files as per check_docstring rules ...
tomaarsen a41491f
Update glm-image dates via fix-repo
tomaarsen de56122
FloatTensor -> LongTensor for image_tokens
tomaarsen d6fd917
Add simple last_hidden_state description, fix output typing of Gemma3…
tomaarsen 7329ebc
Add missing `-> tuple | BaseModel...` on check_model_inputs
tomaarsen 72a9ac9
Ensure forward typing with check_model_inputs is `-> tuple | BaseMode…
tomaarsen 9b67014
Undo accidental rename of Ovis2VisionAttention
tomaarsen cd88179
Fix incorrect type hints for blip family
tomaarsen b58f3c5
Merge branch 'main' into feat/normalize_get_features_methods
tomaarsen e747669
Patch get_image_features for lighton_ocr
tomaarsen 95a55ad
Explicitly use Ovis2VisionAttention in Ovis2VisionEncoderLayer in mod…
tomaarsen ef77832
Update use of get_image_features for lighton_ocr
tomaarsen 194a1bd
Rerun python utils/add_dates.py
tomaarsen 0ce7bac
Remove tie_last_hidden_states=False from check_model_inputs from ...
tomaarsen 6604784
Revert accidental metaclip import change
tomaarsen 0746344
Merge branch 'main' into feat/normalize_get_features_methods
tomaarsen ed5c136
Add missing return_dict=True in get_..._features methods
tomaarsen 3f0c754
Add `output_hidden_states=True` in InternVL get_image_features
tomaarsen 061527d
Add missing docstring for llava_next_video get_video_features
tomaarsen af776e9
Quick clean-up in _video_features_prepare_config_and_inputs test helper
tomaarsen 125a49d
model.set_attn_implementation instead of config._attn_implementation
tomaarsen 71f9f76
Add simple docstring to some helper methods re. inputs.
tomaarsen c69c4c5
Explain why get_..._features test inputs are overridden
tomaarsen 72891b9
Undo incorrect return_dict=True change in deepseek_vl_hybrid
tomaarsen 0d61f66
Revert accidental metaclip import change
tomaarsen fa32eff
Adopt **vision_outputs in instructblip, but mess remains
tomaarsen 1a381aa
Merge branch 'main' into feat/normalize_get_features_methods
tomaarsen a1e6767
Avoid kwargs["output_hidden_states"] = True in get_..._features methods
tomaarsen d9001cc
Update check_model_inputs to default vision args based on config
tomaarsen 0923216
Unrelated but important: patch set_attn_implementation for Windows
tomaarsen e3b774e
Revert output_hidden_states changes on InternVL
tomaarsen 37a495c
Extend d9001cc (check_model_inputs); remove more vision_feature_layer…
tomaarsen bf9182d
Patch unusual bug: llava_next_video used self.vision_feature_layer
tomaarsen 15c2a59
Add unused use_cache to TimmWrapperModel to patch FastVLM
tomaarsen 92fe926
Merge branch 'main' into feat/normalize_get_features_methods
tomaarsen d860470
Update check_config_attributes to allow for vision attributes
tomaarsen 45d2c33
Add tests for config.return_dict=False
tomaarsen 5199c47
permute and quantize separately for the comment
tomaarsen 9865895
Ditch shared custom_args for ernie4_5_vl_moe
tomaarsen 276dcaa
Move Ernie4_5_VL_MoeVisionAttention next to VisionBlock
tomaarsen c804de4
Add missing "attentions" from Florence2 _can_record_outputs
tomaarsen 72a1a09
Clarify kwargs.get("image_sizes") in modeling_llava
tomaarsen 43ec4b3
Remove commented skip_test_image_features_output_shape in chameleon t…
tomaarsen 4515b29
Add a migration guide under 'Library-wide changes with lesser impact'
tomaarsen cd4c0cb
Parameterize get_..._features tests with return_dict (True, False, N…
tomaarsen 292ef3a
Add comment re. TimmWrapper _can_record_outputs
tomaarsen 355bcb4
Shrink Gemma3nAudioEncoderModelOutput with auto_docstring & superclass
tomaarsen bf0ae70
Revert "Unrelated but important: patch set_attn_implementation for Wi…
tomaarsen d8e786f
Merge branch 'main' into feat/normalize_get_features_methods
tomaarsen File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -175,3 +175,4 @@ print(response) | |
|
|
||
| [[autodoc]] AriaForConditionalGeneration | ||
| - forward | ||
| - get_image_features | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -294,3 +294,4 @@ Tracked 2 objects through 200 frames | |
|
|
||
| [[autodoc]] EdgeTamVideoModel | ||
| - forward | ||
| - get_image_features | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -291,3 +291,4 @@ alt="drawing" width="600"/> | |
|
|
||
| [[autodoc]] GotOcr2ForConditionalGeneration | ||
| - forward | ||
| - get_image_features | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have the updated signatures but no documents in general on the usage or general tips. This will become a hidden feature this way, we should promote this (we already have way too much undocumented features 😭)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, I'll try to see if
auto_docstringhas a default intro that can be partially reused for theget_..._featuresmethods.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we also add another file just for general guidance on this - autodocstring is one thing but just a general introduction would be nice as well. Doesn't have to big but giving a gist and detail what embeddings are expected etc