Skip to content

Comments

Allow VLMs to have a correct base_model#41589

Merged
zucchini-nlp merged 22 commits intohuggingface:mainfrom
zucchini-nlp:vlm-base-model
Nov 18, 2025
Merged

Allow VLMs to have a correct base_model#41589
zucchini-nlp merged 22 commits intohuggingface:mainfrom
zucchini-nlp:vlm-base-model

Conversation

@zucchini-nlp
Copy link
Member

@zucchini-nlp zucchini-nlp commented Oct 14, 2025

What does this PR do?

As per title, being able to simply call model.base_model is a useful feat for models which we didn't support in VLMs

There will be no more weird patter like this after this PR:

            if hasattr(model, "visual"):
                vision_module = model.visual
            elif hasattr(model, "model") and hasattr(model.model, "visual"):
                vision_module = model.model.visual


# Now we can
vision_module = model.base_model.visual

Also PR fixes loading base model in Llava-like models which was accidentally deleted. We still need a conversion mapping for legacy checkpoints

# throws bunch of missing keys in main and works with this PR
model  = AutoModel.from_pretrained('llava-hf/llava-1.5-7b-hf') 

@zucchini-nlp
Copy link
Member Author

run-slow: cohere2_vision, flava, florence2, gemma3, gemma3n, glm4v, lfm2_vl, llama4, llava, qwen2_vl

@github-actions
Copy link
Contributor

This comment contains run-slow, running the specified jobs:

models: ['models/cohere2_vision', 'models/flava', 'models/florence2', 'models/gemma3', 'models/gemma3n', 'models/glm4v', 'models/lfm2_vl', 'models/llama4', 'models/llava', 'models/qwen2_vl']
quantizations: [] ...

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@zucchini-nlp
Copy link
Member Author

run-slow: cohere2_vision, flava, florence2, gemma3, gemma3n, glm4v, lfm2_vl, llama4, llava, qwen2_vl

@github-actions
Copy link
Contributor

This comment contains run-slow, running the specified jobs:

models: ['models/cohere2_vision', 'models/flava', 'models/florence2', 'models/gemma3', 'models/gemma3n', 'models/glm4v', 'models/lfm2_vl', 'models/llama4', 'models/llava', 'models/qwen2_vl']
quantizations: [] ...

@zucchini-nlp
Copy link
Member Author

run-slow: cohere2_vision, flava, florence2, gemma3, gemma3n, glm4v, lfm2_vl, llama4, llava, qwen2_vl

1 similar comment
@zucchini-nlp
Copy link
Member Author

run-slow: cohere2_vision, flava, florence2, gemma3, gemma3n, glm4v, lfm2_vl, llama4, llava, qwen2_vl

@zucchini-nlp
Copy link
Member Author

run-slow: cohere2_vision, flava, florence2, gemma3, gemma3n, glm4v, lfm2_vl, llama4, llava, qwen2_vl

@zucchini-nlp
Copy link
Member Author

run-slow: cohere2_vision, flava, florence2, gemma3, gemma3n, glm4v, lfm2_vl, llama4, llava, qwen2_vl

1 similar comment
@zucchini-nlp
Copy link
Member Author

run-slow: cohere2_vision, flava, florence2, gemma3, gemma3n, glm4v, lfm2_vl, llama4, llava, qwen2_vl

@zucchini-nlp
Copy link
Member Author

Maybe merge main first?

@zucchini-nlp
Copy link
Member Author

run-slow: cohere2_vision, flava, florence2, gemma3, gemma3n, glm4v, lfm2_vl, llama4, llava, qwen2_vl

@github-actions
Copy link
Contributor

This comment contains run-slow, running the specified jobs:

models: ['models/cohere2_vision', 'models/flava', 'models/florence2', 'models/gemma3', 'models/gemma3n', 'models/glm4v', 'models/lfm2_vl', 'models/llama4', 'models/llava', 'models/qwen2_vl']
quantizations: [] ...

@zucchini-nlp
Copy link
Member Author

run-slow: cohere2_vision, flava, florence2, gemma3, gemma3n, glm4v, lfm2_vl, llama4, llava, qwen2_vl

@zucchini-nlp
Copy link
Member Author

run-slow: cohere2_vision, flava, florence2, gemma3, gemma3n, glm4v, lfm2_vl, llama4, llava, qwen2_vl

@github-actions
Copy link
Contributor

This comment contains run-slow, running the specified jobs:

models: ['models/cohere2_vision', 'models/flava', 'models/florence2', 'models/gemma3', 'models/gemma3n', 'models/glm4v', 'models/lfm2_vl', 'models/llama4', 'models/llava', 'models/qwen2_vl']
quantizations: [] ...

@zucchini-nlp
Copy link
Member Author

Looks like the test failing are same as in main branch, and the checkpoints are still loaded correctly without mismatch keys

@zucchini-nlp
Copy link
Member Author

zucchini-nlp commented Nov 14, 2025

Now it doesn't touch core modeling anymore, because weight loading was refactored. Instead it fixes loading base model in Llava-like models which was accidentally deleted. We still need a conversion mapping for legacy checkpointss

And as per title, make sure that all VLMs have a correct base_model_prefix. I made sure that loading with model = AutoModel.from_pretrained('llava-hf/llava-1.5-7b-hf') doesn't throw errors about missing keys

I will request non core-maintainer review after slow tests ✅

@zucchini-nlp
Copy link
Member Author

run-slow: aria, aya_vision, blt, emu3, flava, gemma3, gemma3n, glm4v, glm4v_moe, got_ocr2, internvl, llama4, llava, llava_next, llava_next_video, llava_onevision

@github-actions
Copy link
Contributor

This comment contains run-slow, running the specified jobs:

models: ["models/aria", "models/aya_vision", "models/blt", "models/emu3", "models/flava", "models/gemma3", "models/gemma3n", "models/glm4v", "models/glm4v_moe", "models/got_ocr2", "models/internvl", "models/llama4", "models/llava", "models/llava_next", "models/llava_next_video", "models/llava_onevision"]
quantizations: []

@zucchini-nlp
Copy link
Member Author

Test failures aren't related, cc @molbap whenever you have a chance 😄

@zucchini-nlp zucchini-nlp requested a review from molbap November 14, 2025 10:57
@github-actions
Copy link
Contributor

CI Results

Workflow Run ⚙️

✅ No failing test specific to this PR 🎉 !

Copy link
Contributor

@molbap molbap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice, thanks for the extensive and needed testing! Not much to say except "can we add a test" to make sure when we push new VLMs we don't deviate from this?

@zucchini-nlp
Copy link
Member Author

Yep, I think test can be done and will help also with nudging contribs to use standard modeling format. If the model is supposed to deviate from standard, they can easily skip the test

Will add one today if I get time

@zucchini-nlp
Copy link
Member Author

Tests in main are broken 😢 waiting for a fix so I can merge

Comment on lines +1945 to +1957
def test_model_base_model_prefix(self):
"""
Normally a generative model is a base model + lm_head on top. If this test
fails for new model, probably the model has incorrect `base_model_prefix` or
the you are re-defining base blocks for a generative model.
There are some models which might not fit this assumption, if the model
has a special architecture. Feel free to skip the test in that case with
a reason in description.
"""
for model_class in self.all_generative_model_classes:
config, _ = self.model_tester.prepare_config_and_inputs_for_common()
model = model_class(config)
self.assertTrue(model.base_model is not model)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's pretty good I think, thanks!

@github-actions
Copy link
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: aria, aya_vision, blt, clvp, cohere2_vision, emu3, flava, florence2, gemma3, gemma3n, glm46v, glm4v, glm4v_moe, got_ocr2, internvl, jetmoe

@zucchini-nlp zucchini-nlp enabled auto-merge (squash) November 18, 2025 10:17
@zucchini-nlp zucchini-nlp merged commit c40b370 into huggingface:main Nov 18, 2025
23 checks passed
SangbumChoi pushed a commit to SangbumChoi/transformers that referenced this pull request Jan 23, 2026
* allow VLMs to have a correct `base_model`

* fix copies

* fix copies?

* empty commit

* fix copies

* nits after rebase

* fix copies

* add a test

* skip more tests

* fiix copies, ig have to do it in all PRs after rebase
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants