Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
143 commits
Select commit Hold shift + click to select a range
4c65977
Add return_dict to get_text_features methods to allow returning 'Base…
tomaarsen Dec 2, 2025
47c2418
Add return_dict to get_image_features methods to allow returning 'Bas…
tomaarsen Dec 2, 2025
b6d6df3
make fixup
tomaarsen Dec 2, 2025
aa51419
Ignore discrepancies for pooler_output, focus on last_hidden_state
tomaarsen Dec 4, 2025
278b068
Update get_image_features for the missing architectures
tomaarsen Dec 12, 2025
3b14045
Update all get_audio_features
tomaarsen Dec 12, 2025
b7e0d66
Update get_video_features, except instructblipvideo
tomaarsen Dec 12, 2025
41bcca8
Merge branch 'main' into feat/normalize_get_features_methods
tomaarsen Dec 12, 2025
7eb89b6
Run ruff formatting
tomaarsen Dec 12, 2025
57af63d
Patch Glm4v VisionModel forward with BaseModelOutputWithPooling
tomaarsen Dec 12, 2025
7285187
Patch instructblip, although backwards incompatibility stands
tomaarsen Dec 12, 2025
fd7be52
Patch Kosmos2 and Ovis2
tomaarsen Dec 12, 2025
3f183fd
Reformat Ovis2
tomaarsen Dec 12, 2025
391aac9
Avoid now-deprecated return_attentions
tomaarsen Dec 12, 2025
f8c887f
Remove NumFrames
tomaarsen Dec 16, 2025
9a251ce
Proposal to simplify get_..._features via TransformersKwargs & check_…
tomaarsen Dec 16, 2025
858d9d4
Revert check_model_inputs, adopt can_return_tuple, accept BC on get_.…
tomaarsen Dec 16, 2025
2a64303
Fix typo: can_return_dict -> can_return_tuple
tomaarsen Dec 16, 2025
fc8ee93
Adopt can_return_tuple for many get_image_features
tomaarsen Dec 16, 2025
00aa0f5
Update all get_audio_features, some edge cases handled (e.g. gemma3n)
tomaarsen Dec 16, 2025
1ccbf5a
Update most get_video_features, some edge case remain, e.g. instruct…
tomaarsen Dec 16, 2025
78fa904
Patch Fuyu, just return BaseModelOutputWithPooling without pooler
tomaarsen Dec 16, 2025
f082a8e
Introduce ModelOutput subclass for Chameleon, patch get_image_features
tomaarsen Dec 16, 2025
9ddd3b4
Update modeling files with new output formats for get_..._features
tomaarsen Dec 17, 2025
006b2a5
Update fast_vlm modeling forward from modular llava to remove image_s…
tomaarsen Dec 17, 2025
afd5e64
Merge branch 'main' into feat/normalize_get_features_methods
tomaarsen Dec 18, 2025
1d6639b
Update colqwen2 its self.vlm.model.visual call to expect BaseModelOutput
tomaarsen Dec 18, 2025
d52def3
Replace prior return_dict with check_model_inputs on qwen2_5_vl its V…
tomaarsen Dec 18, 2025
ff67663
Use BaseModelOutputWithProjectionAttentions for Kosmos2 to allow retu…
tomaarsen Dec 18, 2025
22522c4
Update Emu akin to Chameleon
tomaarsen Dec 18, 2025
37a53c3
Update the blip architectures with a naive fix
tomaarsen Dec 18, 2025
440914b
Convert remaining modulars (emu3, janus), patch emu3
tomaarsen Dec 18, 2025
b6dbddd
Merge branch 'main' into feat/normalize_get_features_methods
tomaarsen Dec 18, 2025
48353a5
Patch blip test
tomaarsen Dec 18, 2025
531321c
Update deepseek_vl using a new BaseModelOutputWithHighResVisionEncodings
tomaarsen Dec 18, 2025
70577d2
Remove 'copied' for blip_2, instructblip and kosmos2 as they required…
tomaarsen Dec 18, 2025
f6f90d6
Patch qwen3_vl and qwen3_vl_moe, where I used last_hidden_state inste…
tomaarsen Dec 18, 2025
7af0b66
Run repo-consistency
tomaarsen Dec 18, 2025
8db6370
Merge branch 'main' into feat/normalize_get_features_methods
tomaarsen Dec 31, 2025
cbe007b
Use kwargs["output_hidden_states"] = True to hardcode output_hidden_s…
tomaarsen Dec 31, 2025
7c34c6e
Update new GlmAsr get_audio_features on ForConditionalGeneration
tomaarsen Dec 31, 2025
d9edd99
Run make style
tomaarsen Dec 31, 2025
763ddf6
Try to add _can_record_outputs to florence2
tomaarsen Dec 31, 2025
8420640
Override JanusVisionModel.forward to avoid bad q-former copy from Blip2
tomaarsen Dec 31, 2025
e0ea300
Import missing BaseModelOutput
tomaarsen Dec 31, 2025
78bd0d0
Pop deprecated 'return_attentions', setting 'return_dict' won't be us…
tomaarsen Dec 31, 2025
d348d93
Reintroduce kwargs filtering in llava etc. for safety re. image_sizes
tomaarsen Dec 31, 2025
71ea85a
Use BaseModelOutputWithPooling superclass consistently for custom get…
tomaarsen Dec 31, 2025
8c59e95
Update Blip-2 family and its BaseModelOutputWithVisionQformerOutputs
tomaarsen Jan 7, 2026
3fff252
Merge branch 'main' into feat/normalize_get_features_methods
tomaarsen Jan 7, 2026
3f4c34b
Update glm4v _can_record_outputs
tomaarsen Jan 7, 2026
b39b6d1
Remove check_model_inputs in granite_speech
tomaarsen Jan 7, 2026
af0ccb1
Run make style
tomaarsen Jan 7, 2026
f8e08d9
Add _can_record_outputs to Ovis2VisionModel
tomaarsen Jan 7, 2026
2d747d9
Update get_text_features/get_video_features from pe_video
tomaarsen Jan 7, 2026
008e15d
Update missing case on sam3
tomaarsen Jan 7, 2026
e92efb9
Update get_text_features type hints to Union[tuple, BaseModelOutputWi…
tomaarsen Jan 7, 2026
b06a2d2
Add _can_record_inputs to qwen2_5_omni and qwen2_5_vl
tomaarsen Jan 7, 2026
4a573af
Update get_image_features and get_video_features on ernie4_5_vl_moe
tomaarsen Jan 7, 2026
2c677f9
Update get_image_features type hints to Union[tuple, BaseModelOutputW…
tomaarsen Jan 7, 2026
1a8d14b
Remove @auto_docstring from pe_video, it's seemingly not used on that…
tomaarsen Jan 7, 2026
87d22d3
Update get_video_features type hints to Union[tuple, BaseModelOutputW…
tomaarsen Jan 7, 2026
8d5802e
Fix pe_video import issue
tomaarsen Jan 7, 2026
a9ff924
Update forward, test, and docstring for sam3
tomaarsen Jan 7, 2026
8ad35e7
Update get_audio_features type hints to Union[tuple, BaseModelOutputW…
tomaarsen Jan 7, 2026
7c99867
Add simple test case for get_text_features
tomaarsen Jan 8, 2026
35feb85
First attempt to get get_image_features under test, still 26 failures
tomaarsen Jan 8, 2026
a64634b
Resolve several test failures, progress still slow and inconsistent
tomaarsen Jan 9, 2026
b5b334f
Merge branch 'main' into feat/normalize_get_features_methods
tomaarsen Jan 12, 2026
5ad8ca5
Split up get_..._features tests more, should be simpler to disable/cu…
tomaarsen Jan 12, 2026
0284715
Fix emu3 tests, also track non-temporal ResNet in hidden_states
tomaarsen Jan 12, 2026
be41c04
Patch chameleon, emu3, ernie4_5, janus
tomaarsen Jan 12, 2026
2743053
Skip output_attentions for FastVLM, timm doesn't accept it
tomaarsen Jan 12, 2026
76371d8
Patch groupvit, instructblip, ovis2
tomaarsen Jan 12, 2026
88a5804
Patch paddleocr_vl, qwen2_5_omni, qwen2_5_vl, qwen2_vl, and skip test…
tomaarsen Jan 12, 2026
13875af
Patch qwen3_omni_moe, sam family, edgetam
tomaarsen Jan 12, 2026
e480bc0
Kill now unused BaseModelOutputWithFeatureMaps
tomaarsen Jan 12, 2026
2bd9a49
Remove left-over return_dict from prior attempt
tomaarsen Jan 12, 2026
5455038
Allow for output_hidden_states in theory, but skip impossible tests
tomaarsen Jan 12, 2026
3f75c03
Introduce tests for get_audio_features, fixed all architectures
tomaarsen Jan 12, 2026
5e7d821
Introduce tests for get_video_features, only ernie4_5_vl_moe is failing
tomaarsen Jan 12, 2026
1b8ab38
Call post_init on GraniteSpeechCTCEncoder, which was given a PreTrain…
tomaarsen Jan 12, 2026
3467798
Update llava_onevision test suite, only create video pixel_values in …
tomaarsen Jan 13, 2026
6f23bf5
Create custom video input for ernie4_5_vl_moe
tomaarsen Jan 13, 2026
a8e5f92
Skip CLIP family tests; they don't support output_hidden_states/outpu…
tomaarsen Jan 13, 2026
508955e
Breaking: update Blip2Model.get_text_features to no longer output logits
tomaarsen Jan 13, 2026
df4d751
Satisfy test_num_layers_is_small test for align
tomaarsen Jan 13, 2026
1254b29
Test against last_hidden_state against batch_size and hidden_size
tomaarsen Jan 13, 2026
c8b712f
Skip last_hidden_state shape tests for unusual cases
tomaarsen Jan 14, 2026
d6f0fb9
Update docstrings via auto_docstring for all get_..._features methods
tomaarsen Jan 14, 2026
51638d6
Ensure all auto_doc arguments are documented
tomaarsen Jan 14, 2026
af3b70f
Remove redundant docstrings
tomaarsen Jan 14, 2026
4d522c7
Merge branch 'main' into feat/normalize_get_features_methods
tomaarsen Jan 14, 2026
3564045
Also patch the new glm_image for get_image_features/output_hidden_states
tomaarsen Jan 14, 2026
f7100d3
Update modular files as per check_docstring rules ...
tomaarsen Jan 14, 2026
a41491f
Update glm-image dates via fix-repo
tomaarsen Jan 14, 2026
de56122
FloatTensor -> LongTensor for image_tokens
tomaarsen Jan 15, 2026
d6fd917
Add simple last_hidden_state description, fix output typing of Gemma3…
tomaarsen Jan 15, 2026
7329ebc
Add missing `-> tuple | BaseModel...` on check_model_inputs
tomaarsen Jan 15, 2026
72a9ac9
Ensure forward typing with check_model_inputs is `-> tuple | BaseMode…
tomaarsen Jan 15, 2026
9b67014
Undo accidental rename of Ovis2VisionAttention
tomaarsen Jan 15, 2026
cd88179
Fix incorrect type hints for blip family
tomaarsen Jan 15, 2026
b58f3c5
Merge branch 'main' into feat/normalize_get_features_methods
tomaarsen Jan 15, 2026
e747669
Patch get_image_features for lighton_ocr
tomaarsen Jan 15, 2026
95a55ad
Explicitly use Ovis2VisionAttention in Ovis2VisionEncoderLayer in mod…
tomaarsen Jan 15, 2026
ef77832
Update use of get_image_features for lighton_ocr
tomaarsen Jan 15, 2026
194a1bd
Rerun python utils/add_dates.py
tomaarsen Jan 15, 2026
0ce7bac
Remove tie_last_hidden_states=False from check_model_inputs from ...
tomaarsen Jan 15, 2026
6604784
Revert accidental metaclip import change
tomaarsen Jan 19, 2026
0746344
Merge branch 'main' into feat/normalize_get_features_methods
tomaarsen Jan 19, 2026
ed5c136
Add missing return_dict=True in get_..._features methods
tomaarsen Jan 19, 2026
3f0c754
Add `output_hidden_states=True` in InternVL get_image_features
tomaarsen Jan 19, 2026
061527d
Add missing docstring for llava_next_video get_video_features
tomaarsen Jan 19, 2026
af776e9
Quick clean-up in _video_features_prepare_config_and_inputs test helper
tomaarsen Jan 19, 2026
125a49d
model.set_attn_implementation instead of config._attn_implementation
tomaarsen Jan 19, 2026
71f9f76
Add simple docstring to some helper methods re. inputs.
tomaarsen Jan 19, 2026
c69c4c5
Explain why get_..._features test inputs are overridden
tomaarsen Jan 19, 2026
72891b9
Undo incorrect return_dict=True change in deepseek_vl_hybrid
tomaarsen Jan 19, 2026
0d61f66
Revert accidental metaclip import change
tomaarsen Jan 19, 2026
fa32eff
Adopt **vision_outputs in instructblip, but mess remains
tomaarsen Jan 19, 2026
1a381aa
Merge branch 'main' into feat/normalize_get_features_methods
tomaarsen Jan 22, 2026
a1e6767
Avoid kwargs["output_hidden_states"] = True in get_..._features methods
tomaarsen Jan 22, 2026
d9001cc
Update check_model_inputs to default vision args based on config
tomaarsen Jan 22, 2026
0923216
Unrelated but important: patch set_attn_implementation for Windows
tomaarsen Jan 22, 2026
e3b774e
Revert output_hidden_states changes on InternVL
tomaarsen Jan 22, 2026
37a495c
Extend d9001cc (check_model_inputs); remove more vision_feature_layer…
tomaarsen Jan 22, 2026
bf9182d
Patch unusual bug: llava_next_video used self.vision_feature_layer
tomaarsen Jan 22, 2026
15c2a59
Add unused use_cache to TimmWrapperModel to patch FastVLM
tomaarsen Jan 22, 2026
92fe926
Merge branch 'main' into feat/normalize_get_features_methods
tomaarsen Jan 22, 2026
d860470
Update check_config_attributes to allow for vision attributes
tomaarsen Jan 22, 2026
45d2c33
Add tests for config.return_dict=False
tomaarsen Jan 22, 2026
5199c47
permute and quantize separately for the comment
tomaarsen Jan 22, 2026
9865895
Ditch shared custom_args for ernie4_5_vl_moe
tomaarsen Jan 22, 2026
276dcaa
Move Ernie4_5_VL_MoeVisionAttention next to VisionBlock
tomaarsen Jan 22, 2026
c804de4
Add missing "attentions" from Florence2 _can_record_outputs
tomaarsen Jan 22, 2026
72a1a09
Clarify kwargs.get("image_sizes") in modeling_llava
tomaarsen Jan 22, 2026
43ec4b3
Remove commented skip_test_image_features_output_shape in chameleon t…
tomaarsen Jan 22, 2026
4515b29
Add a migration guide under 'Library-wide changes with lesser impact'
tomaarsen Jan 22, 2026
cd4c0cb
Parameterize get_..._features tests with return_dict (True, False, N…
tomaarsen Jan 23, 2026
292ef3a
Add comment re. TimmWrapper _can_record_outputs
tomaarsen Jan 23, 2026
355bcb4
Shrink Gemma3nAudioEncoderModelOutput with auto_docstring & superclass
tomaarsen Jan 23, 2026
bf0ae70
Revert "Unrelated but important: patch set_attn_implementation for Wi…
tomaarsen Jan 23, 2026
d8e786f
Merge branch 'main' into feat/normalize_get_features_methods
tomaarsen Jan 23, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 33 additions & 2 deletions MIGRATION_GUIDE_V5.md
Original file line number Diff line number Diff line change
Expand Up @@ -453,6 +453,37 @@ We dropped support for two torch APIs:

Those APIs were deprecated by the PyTorch team, and we're instead focusing on the supported APIs `dynamo` and `export`.

### Feature extraction helpers: `get_*_features`

Many multi-modal models expose convenience methods such as `get_text_features`, `get_image_features`, `get_audio_features`, and `get_video_features` to run inference on a single modality without calling `model(**inputs)` directly.

Starting with v5, these 4 helper methods now return a `BaseModelOutputWithPooling` (or a subclass) instead of only a pooled embedding tensor:

- `last_hidden_state`: unpooled token/patch/frame embeddings for the requested modality.
- `pooler_output`: pooled representation (what most models previously returned from `get_*_features`).
- `hidden_states`: full hidden states for all layers when `output_hidden_states=True` is passed.
- `attentions`: attention maps when `output_attentions=True` is passed.

> [!IMPORTANT]
> There is **no single universal shape** for `last_hidden_state` or `pooler_output`. It's recommended to inspect a small forward pass before making assumptions about shapes or semantics.

If your code previously did something like this:

```python
text_embeddings = model.get_text_features(**inputs)
```

and you used `text_embeddings` as a tensor, you should now explicitly use `return_dict=True` take the `pooler_output` field from the returned `BaseModelOutputWithPooling`:

```python
outputs = model.get_text_features(**inputs, return_dict=True)
text_embeddings = outputs.pooler_output
```

This will match the previous behavior in the large majority of cases. If your model-specific implementation returned a tuple of results before, those values should now be accessible as fields on the corresponding `BaseModelOutputWithPooling` subclass.

Linked PR: https://github.com/huggingface/transformers/pull/42564

## Quantization changes

We clean up the quantization API in transformers, and significantly refactor the weight loading as highlighted
Expand Down Expand Up @@ -558,7 +589,7 @@ Linked PRs:
- `use_mps_device` -> mps will be used by default if detected
- `fp16_backend` and `half_precision_backend` -> we will only rely on torch.amp as everything has been upstream to torch
- `no_cuda` -> `use_cpu`
- ` include_tokens_per_second` -> `include_num_input_tokens_seen`
- `include_tokens_per_second` -> `include_num_input_tokens_seen`
- `use_legacy_prediction_loop` -> we only use `evaluation_loop` function from now on

### Removing deprecated arguments in `Trainer`
Expand All @@ -574,7 +605,7 @@ Linked PRs:

### New defaults for `Trainer`

- `use_cache` in the model config will be set to `False`. You can still change the cache value through `TrainingArguments` `usel_cache` argument if needed.
- `use_cache` in the model config will be set to `False`. You can still change the cache value through `TrainingArguments` `use_cache` argument if needed.

## Pipelines

Expand Down
2 changes: 2 additions & 0 deletions docs/source/en/model_doc/aimv2.md
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have the updated signatures but no documents in general on the usage or general tips. This will become a hidden feature this way, we should promote this (we already have way too much undocumented features 😭)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, I'll try to see if auto_docstring has a default intro that can be partially reused for the get_..._features methods.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also add another file just for general guidance on this - autodocstring is one thing but just a general introduction would be nice as well. Doesn't have to big but giving a gist and detail what embeddings are expected etc

Original file line number Diff line number Diff line change
Expand Up @@ -89,6 +89,8 @@ probs = outputs.logits_per_image.softmax(dim=-1)

[[autodoc]] Aimv2Model
- forward
- get_text_features
- get_image_features

## Aimv2VisionModel

Expand Down
1 change: 1 addition & 0 deletions docs/source/en/model_doc/aria.md
Original file line number Diff line number Diff line change
Expand Up @@ -175,3 +175,4 @@ print(response)

[[autodoc]] AriaForConditionalGeneration
- forward
- get_image_features
1 change: 1 addition & 0 deletions docs/source/en/model_doc/audioflamingo3.md
Original file line number Diff line number Diff line change
Expand Up @@ -401,3 +401,4 @@ are forwarded, so you can tweak padding or tensor formats just like when calling

[[autodoc]] AudioFlamingo3ForConditionalGeneration
- forward
- get_audio_features
1 change: 1 addition & 0 deletions docs/source/en/model_doc/aya_vision.md
Original file line number Diff line number Diff line change
Expand Up @@ -274,3 +274,4 @@ print(processor.tokenizer.decode(generated[0], skip_special_tokens=True))

[[autodoc]] AyaVisionForConditionalGeneration
- forward
- get_image_features
1 change: 1 addition & 0 deletions docs/source/en/model_doc/blip-2.md
Original file line number Diff line number Diff line change
Expand Up @@ -97,6 +97,7 @@ If you're interested in submitting a resource to be included here, please feel f
[[autodoc]] Blip2ForConditionalGeneration
- forward
- generate
- get_image_features

## Blip2ForImageTextRetrieval

Expand Down
2 changes: 2 additions & 0 deletions docs/source/en/model_doc/chameleon.md
Original file line number Diff line number Diff line change
Expand Up @@ -203,8 +203,10 @@ model = ChameleonForConditionalGeneration.from_pretrained(

[[autodoc]] ChameleonModel
- forward
- get_image_features

## ChameleonForConditionalGeneration

[[autodoc]] ChameleonForConditionalGeneration
- forward
- get_image_features
2 changes: 2 additions & 0 deletions docs/source/en/model_doc/cohere2_vision.md
Original file line number Diff line number Diff line change
Expand Up @@ -125,11 +125,13 @@ print(outputs)

[[autodoc]] Cohere2VisionForConditionalGeneration
- forward
- get_image_features

## Cohere2VisionModel

[[autodoc]] Cohere2VisionModel
- forward
- get_image_features

## Cohere2VisionImageProcessorFast

Expand Down
1 change: 1 addition & 0 deletions docs/source/en/model_doc/deepseek_vl.md
Original file line number Diff line number Diff line change
Expand Up @@ -223,6 +223,7 @@ model = DeepseekVLForConditionalGeneration.from_pretrained(

[[autodoc]] DeepseekVLModel
- forward
- get_image_features

## DeepseekVLForConditionalGeneration

Expand Down
1 change: 1 addition & 0 deletions docs/source/en/model_doc/deepseek_vl_hybrid.md
Original file line number Diff line number Diff line change
Expand Up @@ -222,6 +222,7 @@ model = DeepseekVLHybridForConditionalGeneration.from_pretrained(

[[autodoc]] DeepseekVLHybridModel
- forward
- get_image_features

## DeepseekVLHybridForConditionalGeneration

Expand Down
1 change: 1 addition & 0 deletions docs/source/en/model_doc/edgetam.md
Original file line number Diff line number Diff line change
Expand Up @@ -330,3 +330,4 @@ EdgeTAM can use masks from previous predictions as input to refine segmentation:

[[autodoc]] EdgeTamModel
- forward
- get_image_features
1 change: 1 addition & 0 deletions docs/source/en/model_doc/edgetam_video.md
Original file line number Diff line number Diff line change
Expand Up @@ -294,3 +294,4 @@ Tracked 2 objects through 200 frames

[[autodoc]] EdgeTamVideoModel
- forward
- get_image_features
4 changes: 4 additions & 0 deletions docs/source/en/model_doc/ernie4_5_vl_moe.md
Original file line number Diff line number Diff line change
Expand Up @@ -222,8 +222,12 @@ print(output_text)

[[autodoc]] Ernie4_5_VL_MoeModel
- forward
- get_video_features
- get_image_features

## Ernie4_5_VL_MoeForConditionalGeneration

[[autodoc]] Ernie4_5_VL_MoeForConditionalGeneration
- forward
- get_video_features
- get_image_features
1 change: 1 addition & 0 deletions docs/source/en/model_doc/fast_vlm.md
Original file line number Diff line number Diff line change
Expand Up @@ -171,3 +171,4 @@ Flash Attention 2 is an even faster, optimized version of the previous optimizat

[[autodoc]] FastVlmForConditionalGeneration
- forward
- get_image_features
2 changes: 2 additions & 0 deletions docs/source/en/model_doc/florence2.md
Original file line number Diff line number Diff line change
Expand Up @@ -177,11 +177,13 @@ print(parsed_answer)

[[autodoc]] Florence2Model
- forward
- get_image_features

## Florence2ForConditionalGeneration

[[autodoc]] Florence2ForConditionalGeneration
- forward
- get_image_features

## Florence2VisionBackbone

Expand Down
1 change: 1 addition & 0 deletions docs/source/en/model_doc/gemma3.md
Original file line number Diff line number Diff line change
Expand Up @@ -271,6 +271,7 @@ visualizer("<img>What is shown in this image?")

[[autodoc]] Gemma3ForConditionalGeneration
- forward
- get_image_features

## Gemma3ForSequenceClassification

Expand Down
3 changes: 3 additions & 0 deletions docs/source/en/model_doc/gemma3n.md
Original file line number Diff line number Diff line change
Expand Up @@ -188,6 +188,8 @@ echo -e "Plants create energy through a process known as" | transformers run --t

[[autodoc]] Gemma3nModel
- forward
- get_image_features
- get_audio_features

## Gemma3nForCausalLM

Expand All @@ -198,6 +200,7 @@ echo -e "Plants create energy through a process known as" | transformers run --t

[[autodoc]] Gemma3nForConditionalGeneration
- forward
- get_image_features

[altup]: https://proceedings.neurips.cc/paper_files/paper/2023/hash/f2059277ac6ce66e7e5543001afa8bb5-Abstract-Conference.html
[attention-mask-viz]: https://github.com/huggingface/transformers/blob/beb9b5b02246b9b7ee81ddf938f93f44cfeaad19/src/transformers/utils/attention_visualizer.py#L139
Expand Down
4 changes: 4 additions & 0 deletions docs/source/en/model_doc/glm46v.md
Original file line number Diff line number Diff line change
Expand Up @@ -78,8 +78,12 @@ This model was contributed by [Raushan Turganbay](https://huggingface.co/Raushan

[[autodoc]] Glm46VModel
- forward
- get_video_features
- get_image_features

## Glm46VForConditionalGeneration

[[autodoc]] Glm46VForConditionalGeneration
- forward
- get_video_features
- get_image_features
12 changes: 8 additions & 4 deletions docs/source/en/model_doc/glm4v.md
Original file line number Diff line number Diff line change
Expand Up @@ -215,19 +215,23 @@ print(output_text)
## Glm4vVisionModel

[[autodoc]] Glm4vVisionModel
- forward
- forward

## Glm4vTextModel

[[autodoc]] Glm4vTextModel
- forward
- forward

## Glm4vModel

[[autodoc]] Glm4vModel
- forward
- forward
- get_video_features
- get_image_features

## Glm4vForConditionalGeneration

[[autodoc]] Glm4vForConditionalGeneration
- forward
- forward
- get_video_features
- get_image_features
4 changes: 4 additions & 0 deletions docs/source/en/model_doc/glm4v_moe.md
Original file line number Diff line number Diff line change
Expand Up @@ -76,8 +76,12 @@ This model was contributed by [Raushan Turganbay](https://huggingface.co/Raushan

[[autodoc]] Glm4vMoeModel
- forward
- get_video_features
- get_image_features

## Glm4vMoeForConditionalGeneration

[[autodoc]] Glm4vMoeForConditionalGeneration
- forward
- get_video_features
- get_image_features
4 changes: 3 additions & 1 deletion docs/source/en/model_doc/glm_image.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be rendered properly in your Markdown viewer.

-->
*This model was released on 2026-01-10 and added to Hugging Face Transformers on 2026-01-10.*
*This model was released on 2026-01-10 and added to Hugging Face Transformers on 2026-01-13.*

# GlmImage

Expand Down Expand Up @@ -199,8 +199,10 @@ print(f"Output tokens: {output_tokens}")

[[autodoc]] GlmImageModel
- forward
- get_image_features

## GlmImageForConditionalGeneration

[[autodoc]] GlmImageForConditionalGeneration
- forward
- get_image_features
1 change: 1 addition & 0 deletions docs/source/en/model_doc/got_ocr2.md
Original file line number Diff line number Diff line change
Expand Up @@ -291,3 +291,4 @@ alt="drawing" width="600"/>

[[autodoc]] GotOcr2ForConditionalGeneration
- forward
- get_image_features
1 change: 1 addition & 0 deletions docs/source/en/model_doc/granite_speech.md
Original file line number Diff line number Diff line change
Expand Up @@ -170,3 +170,4 @@ for i, transcription in enumerate(transcriptions):

[[autodoc]] GraniteSpeechForConditionalGeneration
- forward
- get_audio_features
2 changes: 2 additions & 0 deletions docs/source/en/model_doc/idefics2.md
Original file line number Diff line number Diff line change
Expand Up @@ -208,11 +208,13 @@ A list of official Hugging Face and community (indicated by 🌎) resources to h

[[autodoc]] Idefics2Model
- forward
- get_image_features

## Idefics2ForConditionalGeneration

[[autodoc]] Idefics2ForConditionalGeneration
- forward
- get_image_features

## Idefics2ImageProcessor

Expand Down
2 changes: 2 additions & 0 deletions docs/source/en/model_doc/idefics3.md
Original file line number Diff line number Diff line change
Expand Up @@ -70,11 +70,13 @@ This model was contributed by [amyeroberts](https://huggingface.co/amyeroberts)

[[autodoc]] Idefics3Model
- forward
- get_image_features

## Idefics3ForConditionalGeneration

[[autodoc]] Idefics3ForConditionalGeneration
- forward
- get_image_features

## Idefics3ImageProcessor

Expand Down
1 change: 1 addition & 0 deletions docs/source/en/model_doc/instructblip.md
Original file line number Diff line number Diff line change
Expand Up @@ -78,3 +78,4 @@ The attributes can be obtained from model config, as `model.config.num_query_tok
[[autodoc]] InstructBlipForConditionalGeneration
- forward
- generate
- get_image_features
1 change: 1 addition & 0 deletions docs/source/en/model_doc/instructblipvideo.md
Original file line number Diff line number Diff line change
Expand Up @@ -83,3 +83,4 @@ The attributes can be obtained from model config, as `model.config.num_query_tok
[[autodoc]] InstructBlipVideoForConditionalGeneration
- forward
- generate
- get_video_features
2 changes: 2 additions & 0 deletions docs/source/en/model_doc/internvl.md
Original file line number Diff line number Diff line change
Expand Up @@ -339,11 +339,13 @@ This example showcases how to handle a batch of chat conversations with interlea

[[autodoc]] InternVLModel
- forward
- get_image_features

## InternVLForConditionalGeneration

[[autodoc]] InternVLForConditionalGeneration
- forward
- get_image_features

## InternVLProcessor

Expand Down
1 change: 1 addition & 0 deletions docs/source/en/model_doc/janus.md
Original file line number Diff line number Diff line change
Expand Up @@ -229,6 +229,7 @@ for i, image in enumerate(images['pixel_values']):

[[autodoc]] JanusModel
- forward
- get_image_features

## JanusForConditionalGeneration

Expand Down
1 change: 1 addition & 0 deletions docs/source/en/model_doc/kosmos-2.md
Original file line number Diff line number Diff line change
Expand Up @@ -96,6 +96,7 @@ This model was contributed by [Yih-Dar SHIEH](https://huggingface.co/ydshieh). T

[[autodoc]] Kosmos2Model
- forward
- get_image_features

## Kosmos2ForConditionalGeneration

Expand Down
2 changes: 2 additions & 0 deletions docs/source/en/model_doc/lfm2_vl.md
Original file line number Diff line number Diff line change
Expand Up @@ -92,8 +92,10 @@ processor.batch_decode(outputs, skip_special_tokens=True)[0]

[[autodoc]] Lfm2VlModel
- forward
- get_image_features

## Lfm2VlForConditionalGeneration

[[autodoc]] Lfm2VlForConditionalGeneration
- forward
- get_image_features
2 changes: 2 additions & 0 deletions docs/source/en/model_doc/lighton_ocr.md
Original file line number Diff line number Diff line change
Expand Up @@ -73,8 +73,10 @@ print(output_text)

[[autodoc]] LightOnOcrModel
- forward
- get_image_features

## LightOnOcrForConditionalGeneration

[[autodoc]] LightOnOcrForConditionalGeneration
- forward
- get_image_features
1 change: 1 addition & 0 deletions docs/source/en/model_doc/llama4.md
Original file line number Diff line number Diff line change
Expand Up @@ -426,6 +426,7 @@ model = Llama4ForConditionalGeneration.from_pretrained(

[[autodoc]] Llama4ForConditionalGeneration
- forward
- get_image_features

## Llama4ForCausalLM

Expand Down
1 change: 1 addition & 0 deletions docs/source/en/model_doc/llava.md
Original file line number Diff line number Diff line change
Expand Up @@ -260,3 +260,4 @@ A list of official Hugging Face and community (indicated by 🌎) resources to h

[[autodoc]] LlavaForConditionalGeneration
- forward
- get_image_features
1 change: 1 addition & 0 deletions docs/source/en/model_doc/llava_next.md
Original file line number Diff line number Diff line change
Expand Up @@ -216,3 +216,4 @@ print(processor.decode(output[0], skip_special_tokens=True))

[[autodoc]] LlavaNextForConditionalGeneration
- forward
- get_image_features
2 changes: 2 additions & 0 deletions docs/source/en/model_doc/llava_next_video.md
Original file line number Diff line number Diff line change
Expand Up @@ -259,3 +259,5 @@ model = LlavaNextVideoForConditionalGeneration.from_pretrained(

[[autodoc]] LlavaNextVideoForConditionalGeneration
- forward
- get_image_features
- get_video_features
2 changes: 2 additions & 0 deletions docs/source/en/model_doc/llava_onevision.md
Original file line number Diff line number Diff line change
Expand Up @@ -322,3 +322,5 @@ model = LlavaOnevisionForConditionalGeneration.from_pretrained(

[[autodoc]] LlavaOnevisionForConditionalGeneration
- forward
- get_image_features
- get_video_features
Loading