Allow VLMs to have a correct `base_model` by zucchini-nlp · Pull Request #41589 · huggingface/transformers

zucchini-nlp · 2025-10-14T16:24:25Z

What does this PR do?

As per title, being able to simply call model.base_model is a useful feat for models which we didn't support in VLMs

There will be no more weird patter like this after this PR:

            if hasattr(model, "visual"):
                vision_module = model.visual
            elif hasattr(model, "model") and hasattr(model.model, "visual"):
                vision_module = model.model.visual


# Now we can
vision_module = model.base_model.visual

Also PR fixes loading base model in Llava-like models which was accidentally deleted. We still need a conversion mapping for legacy checkpoints

# throws bunch of missing keys in main and works with this PR
model  = AutoModel.from_pretrained('llava-hf/llava-1.5-7b-hf')

zucchini-nlp · 2025-10-14T16:31:33Z

run-slow: cohere2_vision, flava, florence2, gemma3, gemma3n, glm4v, lfm2_vl, llama4, llava, qwen2_vl

github-actions · 2025-10-14T16:33:11Z

This comment contains run-slow, running the specified jobs:

models: ['models/cohere2_vision', 'models/flava', 'models/florence2', 'models/gemma3', 'models/gemma3n', 'models/glm4v', 'models/lfm2_vl', 'models/llama4', 'models/llava', 'models/qwen2_vl']
quantizations: [] ...

HuggingFaceDocBuilderDev · 2025-10-14T16:33:40Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

zucchini-nlp · 2025-10-14T16:56:48Z

run-slow: cohere2_vision, flava, florence2, gemma3, gemma3n, glm4v, lfm2_vl, llama4, llava, qwen2_vl

github-actions · 2025-10-14T16:58:08Z

This comment contains run-slow, running the specified jobs:

models: ['models/cohere2_vision', 'models/flava', 'models/florence2', 'models/gemma3', 'models/gemma3n', 'models/glm4v', 'models/lfm2_vl', 'models/llama4', 'models/llava', 'models/qwen2_vl']
quantizations: [] ...

zucchini-nlp · 2025-10-14T17:20:07Z

run-slow: cohere2_vision, flava, florence2, gemma3, gemma3n, glm4v, lfm2_vl, llama4, llava, qwen2_vl

zucchini-nlp · 2025-10-14T17:38:11Z

run-slow: cohere2_vision, flava, florence2, gemma3, gemma3n, glm4v, lfm2_vl, llama4, llava, qwen2_vl

zucchini-nlp · 2025-10-15T08:29:54Z

run-slow: cohere2_vision, flava, florence2, gemma3, gemma3n, glm4v, lfm2_vl, llama4, llava, qwen2_vl

zucchini-nlp · 2025-10-15T13:36:24Z

run-slow: cohere2_vision, flava, florence2, gemma3, gemma3n, glm4v, lfm2_vl, llama4, llava, qwen2_vl

zucchini-nlp · 2025-10-15T14:58:19Z

run-slow: cohere2_vision, flava, florence2, gemma3, gemma3n, glm4v, lfm2_vl, llama4, llava, qwen2_vl

zucchini-nlp · 2025-10-15T15:01:02Z

Maybe merge main first?

zucchini-nlp · 2025-10-15T15:01:18Z

run-slow: cohere2_vision, flava, florence2, gemma3, gemma3n, glm4v, lfm2_vl, llama4, llava, qwen2_vl

github-actions · 2025-10-15T15:03:26Z

This comment contains run-slow, running the specified jobs:

models: ['models/cohere2_vision', 'models/flava', 'models/florence2', 'models/gemma3', 'models/gemma3n', 'models/glm4v', 'models/lfm2_vl', 'models/llama4', 'models/llava', 'models/qwen2_vl']
quantizations: [] ...

zucchini-nlp · 2025-10-15T15:33:28Z

run-slow: cohere2_vision, flava, florence2, gemma3, gemma3n, glm4v, lfm2_vl, llama4, llava, qwen2_vl

zucchini-nlp · 2025-10-15T15:39:18Z

run-slow: cohere2_vision, flava, florence2, gemma3, gemma3n, glm4v, lfm2_vl, llama4, llava, qwen2_vl

github-actions · 2025-10-15T15:40:39Z

This comment contains run-slow, running the specified jobs:

models: ['models/cohere2_vision', 'models/flava', 'models/florence2', 'models/gemma3', 'models/gemma3n', 'models/glm4v', 'models/lfm2_vl', 'models/llama4', 'models/llava', 'models/qwen2_vl']
quantizations: [] ...

zucchini-nlp · 2025-10-15T16:08:50Z

Looks like the test failing are same as in main branch, and the checkpoints are still loaded correctly without mismatch keys

src/transformers/modeling_utils.py

zucchini-nlp · 2025-11-14T10:36:39Z

Now it doesn't touch core modeling anymore, because weight loading was refactored. Instead it fixes loading base model in Llava-like models which was accidentally deleted. We still need a conversion mapping for legacy checkpointss

And as per title, make sure that all VLMs have a correct base_model_prefix. I made sure that loading with model = AutoModel.from_pretrained('llava-hf/llava-1.5-7b-hf') doesn't throw errors about missing keys

I will request non core-maintainer review after slow tests ✅

zucchini-nlp · 2025-11-14T10:36:42Z

run-slow: aria, aya_vision, blt, emu3, flava, gemma3, gemma3n, glm4v, glm4v_moe, got_ocr2, internvl, llama4, llava, llava_next, llava_next_video, llava_onevision

github-actions · 2025-11-14T10:37:57Z

This comment contains run-slow, running the specified jobs:

models: ["models/aria", "models/aya_vision", "models/blt", "models/emu3", "models/flava", "models/gemma3", "models/gemma3n", "models/glm4v", "models/glm4v_moe", "models/got_ocr2", "models/internvl", "models/llama4", "models/llava", "models/llava_next", "models/llava_next_video", "models/llava_onevision"]
quantizations: []

zucchini-nlp · 2025-11-14T10:57:47Z

Test failures aren't related, cc @molbap whenever you have a chance 😄

github-actions · 2025-11-14T12:25:18Z

CI Results

Workflow Run ⚙️

✅ No failing test specific to this PR 🎉 !

molbap

Very nice, thanks for the extensive and needed testing! Not much to say except "can we add a test" to make sure when we push new VLMs we don't deviate from this?

zucchini-nlp · 2025-11-14T16:32:58Z

Yep, I think test can be done and will help also with nudging contribs to use standard modeling format. If the model is supposed to deviate from standard, they can easily skip the test

Will add one today if I get time

zucchini-nlp · 2025-11-17T12:09:42Z

Tests in main are broken 😢 waiting for a fix so I can merge

molbap · 2025-11-18T09:12:26Z

tests/test_modeling_common.py

+    def test_model_base_model_prefix(self):
+        """
+        Normally a generative model is a base model + lm_head on top. If this test
+        fails for new model, probably the model has incorrect `base_model_prefix` or
+        the you are re-defining base blocks for a generative model.
+        There are some models which might not fit this assumption, if the model
+        has a special architecture. Feel free to skip the test in that case with
+        a reason in description.
+        """
+        for model_class in self.all_generative_model_classes:
+            config, _ = self.model_tester.prepare_config_and_inputs_for_common()
+            model = model_class(config)
+            self.assertTrue(model.base_model is not model)


That's pretty good I think, thanks!

github-actions · 2025-11-18T10:09:38Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: aria, aya_vision, blt, clvp, cohere2_vision, emu3, flava, florence2, gemma3, gemma3n, glm46v, glm4v, glm4v_moe, got_ocr2, internvl, jetmoe

* allow VLMs to have a correct `base_model` * fix copies * fix copies? * empty commit * fix copies * nits after rebase * fix copies * add a test * skip more tests * fiix copies, ig have to do it in all PRs after rebase

allow VLMs to have a correct base_model

f1f2df5

fix copies

37651b3

zucchini-nlp added 2 commits October 14, 2025 19:13

fix copies?

ad0ce7c

Merge branch 'main' into vlm-base-model

79345f8

Merge branch 'main' into vlm-base-model

7430d08

Merge branch 'main' into vlm-base-model

963a68e

Merge branch 'main' into vlm-base-model

4fd4061

Merge branch 'main' into vlm-base-model

9196f63

empty commit

a260ee0

zucchini-nlp commented Oct 15, 2025

View reviewed changes

src/transformers/modeling_utils.py Outdated Show resolved Hide resolved

fix copies

a5f484d

zucchini-nlp requested a review from Cyrilvallez October 15, 2025 16:24

Merge branch 'main' into vlm-base-model

11a5ba5

zucchini-nlp requested a review from ArthurZucker October 20, 2025 09:29

Merge branch 'main' into vlm-base-model

8b48320

zucchini-nlp mentioned this pull request Nov 6, 2025

Fix base model prefix in VLMs #42059

Merged

zucchini-nlp added 2 commits November 6, 2025 10:47

Merge branch 'main' into vlm-base-model

9548b9f

Merge branch 'main' into vlm-base-model

2171f88

zucchini-nlp mentioned this pull request Nov 13, 2025

🚨 Generalize get_decoder() for multimodal and delete redundant code 🔪 #42156

Merged

zucchini-nlp added 2 commits November 14, 2025 11:26

merge main

c85101b

nits after rebase

0c75df9

zucchini-nlp removed request for ArthurZucker and Cyrilvallez November 14, 2025 10:37

fix copies

265b765

zucchini-nlp requested a review from molbap November 14, 2025 10:57

molbap approved these changes Nov 14, 2025

View reviewed changes

zucchini-nlp added 3 commits November 14, 2025 18:35

add a test

9da2c32

skip more tests

69feb5f

Merge branch 'main' into vlm-base-model

7b2e281

fiix copies, ig have to do it in all PRs after rebase

02a1474

molbap reviewed Nov 18, 2025

View reviewed changes

Merge branch 'main' into vlm-base-model

9e17a22

zucchini-nlp enabled auto-merge (squash) November 18, 2025 10:17

zucchini-nlp merged commit c40b370 into huggingface:main Nov 18, 2025
23 checks passed

Comments

Conversation

zucchini-nlp commented Oct 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

zucchini-nlp commented Oct 14, 2025

Uh oh!

github-actions bot commented Oct 14, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Oct 14, 2025

Uh oh!

zucchini-nlp commented Oct 14, 2025

Uh oh!

github-actions bot commented Oct 14, 2025

Uh oh!

zucchini-nlp commented Oct 14, 2025

Uh oh!

zucchini-nlp commented Oct 14, 2025

Uh oh!

zucchini-nlp commented Oct 15, 2025

Uh oh!

zucchini-nlp commented Oct 15, 2025

Uh oh!

zucchini-nlp commented Oct 15, 2025

Uh oh!

zucchini-nlp commented Oct 15, 2025

Uh oh!

zucchini-nlp commented Oct 15, 2025

Uh oh!

github-actions bot commented Oct 15, 2025

Uh oh!

zucchini-nlp commented Oct 15, 2025

Uh oh!

zucchini-nlp commented Oct 15, 2025

Uh oh!

github-actions bot commented Oct 15, 2025

Uh oh!

zucchini-nlp commented Oct 15, 2025

Uh oh!

Uh oh!

zucchini-nlp commented Nov 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zucchini-nlp commented Nov 14, 2025

Uh oh!

github-actions bot commented Nov 14, 2025

Uh oh!

zucchini-nlp commented Nov 14, 2025

Uh oh!

github-actions bot commented Nov 14, 2025

CI Results

Uh oh!

molbap left a comment

Choose a reason for hiding this comment

Uh oh!

zucchini-nlp commented Nov 14, 2025

Uh oh!

zucchini-nlp commented Nov 17, 2025

Uh oh!

molbap Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Nov 18, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

zucchini-nlp commented Oct 14, 2025 •

edited

Loading

zucchini-nlp commented Nov 14, 2025 •

edited

Loading