[`fix`] Use `last_hidden_state` key from `get_image_features` for llama4 by tomaarsen · Pull Request #43882 · huggingface/transformers

tomaarsen · 2026-02-10T08:38:51Z

What does this PR do?

Resolves #42564 (comment)
#42564 updated get_image_features for Llama4, but it erroneously started using pooler_output instead of the previous last_hidden_state. Additionally, the Llama4VisionModel output has been updated to BaseModelOutputWithPooling to match many vision models.

Reproducer

import torch

from transformers import AutoProcessor, Llama4ForConditionalGeneration

model_id = "hf-internal-testing/tiny-random-llama4"
processor = AutoProcessor.from_pretrained(model_id)
model = Llama4ForConditionalGeneration.from_pretrained(
    model_id,
    attn_implementation="sdpa",  # flex attention / flash_attention_2 do not work, debugging...
    device_map="auto",
    torch_dtype=torch.bfloat16,
)

# Fix meta tensor: convert to float32 with random weights
if model.vision_model.rotary_embedding.freqs_ci.is_meta:
    shape = model.vision_model.rotary_embedding.freqs_ci.shape
    # Note: Ideally should compute proper RoPE frequencies, but using random as requested
    model.vision_model.rotary_embedding.freqs_ci = torch.randn(*shape, dtype=torch.float32, device=model.device)

url1 = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/0052a70beed5bf71b92610a43a52df6d286cd5f3/diffusers/rabbit.jpg"
url2 = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/datasets/cat_style_layout.png"
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": url1},
            {"type": "image", "url": url2},
            {"type": "text", "text": "Can you describe how these two images are similar, and how they differ?"},
        ]
    },
]

inputs = processor.apply_chat_template(
    messages,
    add_generation_prompt=True,
    tokenize=True,
    return_dict=True,
    return_tensors="pt",
).to(model.device)

outputs = model.generate(
    **inputs,
    max_new_tokens=32,
)

response = processor.batch_decode(outputs[:, inputs["input_ids"].shape[-1]:])[0]
print(response)
print(outputs[0])

On `main`:

Traceback (most recent call last):
  File "c:\code\transformers\demo_llama4.py", line 41, in <module>
    outputs = model.generate(
              ^^^^^^^^^^^^^^^
  File "C:\Users\tom\.conda\envs\transformers\Lib\site-packages\torch\utils\_contextlib.py", line 120, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "C:\code\transformers\src\transformers\generation\utils.py", line 2638, in generate
    result = decoding_method(
             ^^^^^^^^^^^^^^^^
  File "C:\code\transformers\src\transformers\generation\utils.py", line 2833, in _sample
    outputs = self._prefill(input_ids, generation_config, model_kwargs)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\code\transformers\src\transformers\generation\utils.py", line 3822, in _prefill
    return self(**model_inputs, return_dict=True)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\tom\.conda\envs\transformers\Lib\site-packages\torch\nn\modules\module.py", line 1775, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\tom\.conda\envs\transformers\Lib\site-packages\torch\nn\modules\module.py", line 1786, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\code\transformers\src\transformers\utils\generic.py", line 1016, in wrapper
    outputs = func(self, *args, **kwargs)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\code\transformers\src\transformers\models\llama4\modeling_llama4.py", line 1327, in forward
    ).pooler_output
      ^^^^^^^^^^^^^
AttributeError: 'BaseModelOutput' object has no attribute 'pooler_output'

On this PR:

广场 meno बतздрав 공기грамمعграмمعграмمعграмمعграмمعграмمعграмمعграмمعграмمعграмمعграмمعграмمعграмمعграм
tensor([200000, 200005,   1556,  ...,  96359,  20938,  96359], device='cuda:0')

(Gibberish obviously as this is a tiny-random model, but it works now!)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

@zucchini-nlp @vasqu

Thank you to @Mecoli1219 for reporting this.

Tom Aarsen

github-actions · 2026-02-10T08:39:59Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: llama4

HuggingFaceDocBuilderDev · 2026-02-10T08:48:46Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

zucchini-nlp

Btw, do we not have llama4 fast tests running to check this?

zucchini-nlp · 2026-02-10T09:14:56Z

src/transformers/models/llama4/modeling_llama4.py

-        return BaseModelOutput(
+        return BaseModelOutputWithPooling(
            last_hidden_state=hidden_state,
            hidden_states=hidden_states,


i think the last hidden state is the one after layernorm_post, and pooler state is after adapter. Though it'll be a breaking change...
Fine with leaving it as is, thanks

You're right :/ Damn, I wish I spotted that before v5, as it's indeed breaking if we improve it. I did make roughly this change for a few other architectures pre-v5.

tomaarsen · 2026-02-10T09:36:56Z

Btw, do we not have llama4 fast tests running to check this?

No, I was surprised as well. We only have slow tests.

zucchini-nlp · 2026-02-10T09:41:14Z

super weird tbh. The model arch is pretty straightforward so there should be no reason to skip the ModelCommonTest mixin

vasqu

Thanks, yea we don't have fast tests for llama4 (which I also wondered about but I think it was done in a rush so it happens)

ArthurZucker · 2026-02-10T13:40:52Z

Yep we had an insane rush it was the first BIG BIG model 😢

zucchini-nlp · 2026-02-10T14:00:37Z

I see, would be great to add it but I don;;t want to block this PR. Will add in my todo notes, maybe one day 🙃

## Summary  Fix Convergence Tests for Transformers v5, including updating some mismatched variables, increasing the tolerance for some tests, ...etc. The only one error remained is expected and should be fixed in [Transformers#43882](huggingface/transformers#43882).  ## Related Issues & PRs - #978 ## Testing Done   - Hardware Type: H100 - [ ] run `make test` to ensure correctness - [x] run `make checkstyle` to ensure code style - [x] run `make test-convergence` to ensure convergence on v4 & v5 --------- Co-authored-by: Tcc0403 <76503978+Tcc0403@users.noreply.github.com>

…ma4 (huggingface#43882) Use last_hidden_state key from get_image_features for llama4

Use last_hidden_state key from get_image_features for llama4

3493d81

Mecoli1219 mentioned this pull request Feb 10, 2026

Fix Convergence Tests for Transformers v5 linkedin/Liger-Kernel#1090

Merged

3 tasks

zucchini-nlp approved these changes Feb 10, 2026

View reviewed changes

vasqu approved these changes Feb 10, 2026

View reviewed changes

vasqu merged commit 71d7fc5 into huggingface:main Feb 10, 2026
25 checks passed

jiosephlee pushed a commit to jiosephlee/transformers_latest that referenced this pull request Feb 11, 2026

[fix] Use last_hidden_state key from get_image_features for lla…

54dfe02

…ma4 (huggingface#43882) Use last_hidden_state key from get_image_features for llama4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[`fix`] Use `last_hidden_state` key from `get_image_features` for llama4#43882

[`fix`] Use `last_hidden_state` key from `get_image_features` for llama4#43882
vasqu merged 1 commit intohuggingface:mainfrom
tomaarsen:fix/llama4

tomaarsen commented Feb 10, 2026

Uh oh!

github-actions bot commented Feb 10, 2026

Uh oh!

HuggingFaceDocBuilderDev commented Feb 10, 2026

Uh oh!

zucchini-nlp left a comment

Uh oh!

zucchini-nlp Feb 10, 2026

Uh oh!

tomaarsen Feb 10, 2026

Uh oh!

tomaarsen commented Feb 10, 2026

Uh oh!

zucchini-nlp commented Feb 10, 2026 •

edited

Loading

Uh oh!

vasqu left a comment

Uh oh!

ArthurZucker commented Feb 10, 2026

Uh oh!

zucchini-nlp commented Feb 10, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

tomaarsen commented Feb 10, 2026

What does this PR do?

Reproducer

On main:

On this PR:

Before submitting

Who can review?

Uh oh!

github-actions bot commented Feb 10, 2026

Uh oh!

HuggingFaceDocBuilderDev commented Feb 10, 2026

Uh oh!

zucchini-nlp left a comment

Choose a reason for hiding this comment

Uh oh!

zucchini-nlp Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

tomaarsen Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

tomaarsen commented Feb 10, 2026

Uh oh!

zucchini-nlp commented Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vasqu left a comment

Choose a reason for hiding this comment

Uh oh!

ArthurZucker commented Feb 10, 2026

Uh oh!

zucchini-nlp commented Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

On `main`:

zucchini-nlp commented Feb 10, 2026 •

edited

Loading

zucchini-nlp commented Feb 10, 2026 •

edited

Loading