Allow VLMs to have a correct base_model#41589
Allow VLMs to have a correct base_model#41589zucchini-nlp merged 22 commits intohuggingface:mainfrom
base_model#41589Conversation
|
run-slow: cohere2_vision, flava, florence2, gemma3, gemma3n, glm4v, lfm2_vl, llama4, llava, qwen2_vl |
|
This comment contains run-slow, running the specified jobs: models: ['models/cohere2_vision', 'models/flava', 'models/florence2', 'models/gemma3', 'models/gemma3n', 'models/glm4v', 'models/lfm2_vl', 'models/llama4', 'models/llava', 'models/qwen2_vl'] |
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
run-slow: cohere2_vision, flava, florence2, gemma3, gemma3n, glm4v, lfm2_vl, llama4, llava, qwen2_vl |
|
This comment contains run-slow, running the specified jobs: models: ['models/cohere2_vision', 'models/flava', 'models/florence2', 'models/gemma3', 'models/gemma3n', 'models/glm4v', 'models/lfm2_vl', 'models/llama4', 'models/llava', 'models/qwen2_vl'] |
|
run-slow: cohere2_vision, flava, florence2, gemma3, gemma3n, glm4v, lfm2_vl, llama4, llava, qwen2_vl |
1 similar comment
|
run-slow: cohere2_vision, flava, florence2, gemma3, gemma3n, glm4v, lfm2_vl, llama4, llava, qwen2_vl |
|
run-slow: cohere2_vision, flava, florence2, gemma3, gemma3n, glm4v, lfm2_vl, llama4, llava, qwen2_vl |
|
run-slow: cohere2_vision, flava, florence2, gemma3, gemma3n, glm4v, lfm2_vl, llama4, llava, qwen2_vl |
1 similar comment
|
run-slow: cohere2_vision, flava, florence2, gemma3, gemma3n, glm4v, lfm2_vl, llama4, llava, qwen2_vl |
|
Maybe merge main first? |
|
run-slow: cohere2_vision, flava, florence2, gemma3, gemma3n, glm4v, lfm2_vl, llama4, llava, qwen2_vl |
|
This comment contains run-slow, running the specified jobs: models: ['models/cohere2_vision', 'models/flava', 'models/florence2', 'models/gemma3', 'models/gemma3n', 'models/glm4v', 'models/lfm2_vl', 'models/llama4', 'models/llava', 'models/qwen2_vl'] |
|
run-slow: cohere2_vision, flava, florence2, gemma3, gemma3n, glm4v, lfm2_vl, llama4, llava, qwen2_vl |
|
run-slow: cohere2_vision, flava, florence2, gemma3, gemma3n, glm4v, lfm2_vl, llama4, llava, qwen2_vl |
|
This comment contains run-slow, running the specified jobs: models: ['models/cohere2_vision', 'models/flava', 'models/florence2', 'models/gemma3', 'models/gemma3n', 'models/glm4v', 'models/lfm2_vl', 'models/llama4', 'models/llava', 'models/qwen2_vl'] |
|
Looks like the test failing are same as in main |
|
Now it doesn't touch core modeling anymore, because weight loading was refactored. Instead it fixes loading base model in Llava-like models which was accidentally deleted. We still need a conversion mapping for legacy checkpointss And as per title, make sure that all VLMs have a correct I will request non core-maintainer review after slow tests ✅ |
|
run-slow: aria, aya_vision, blt, emu3, flava, gemma3, gemma3n, glm4v, glm4v_moe, got_ocr2, internvl, llama4, llava, llava_next, llava_next_video, llava_onevision |
|
This comment contains models: ["models/aria", "models/aya_vision", "models/blt", "models/emu3", "models/flava", "models/gemma3", "models/gemma3n", "models/glm4v", "models/glm4v_moe", "models/got_ocr2", "models/internvl", "models/llama4", "models/llava", "models/llava_next", "models/llava_next_video", "models/llava_onevision"] |
|
Test failures aren't related, cc @molbap whenever you have a chance 😄 |
CI Results✅ No failing test specific to this PR 🎉 ! |
molbap
left a comment
There was a problem hiding this comment.
Very nice, thanks for the extensive and needed testing! Not much to say except "can we add a test" to make sure when we push new VLMs we don't deviate from this?
|
Yep, I think test can be done and will help also with nudging contribs to use standard modeling format. If the model is supposed to deviate from standard, they can easily skip the test Will add one today if I get time |
|
Tests in main are broken 😢 waiting for a fix so I can merge |
| def test_model_base_model_prefix(self): | ||
| """ | ||
| Normally a generative model is a base model + lm_head on top. If this test | ||
| fails for new model, probably the model has incorrect `base_model_prefix` or | ||
| the you are re-defining base blocks for a generative model. | ||
| There are some models which might not fit this assumption, if the model | ||
| has a special architecture. Feel free to skip the test in that case with | ||
| a reason in description. | ||
| """ | ||
| for model_class in self.all_generative_model_classes: | ||
| config, _ = self.model_tester.prepare_config_and_inputs_for_common() | ||
| model = model_class(config) | ||
| self.assertTrue(model.base_model is not model) |
There was a problem hiding this comment.
That's pretty good I think, thanks!
|
[For maintainers] Suggested jobs to run (before merge) run-slow: aria, aya_vision, blt, clvp, cohere2_vision, emu3, flava, florence2, gemma3, gemma3n, glm46v, glm4v, glm4v_moe, got_ocr2, internvl, jetmoe |
* allow VLMs to have a correct `base_model` * fix copies * fix copies? * empty commit * fix copies * nits after rebase * fix copies * add a test * skip more tests * fiix copies, ig have to do it in all PRs after rebase
What does this PR do?
As per title, being able to simply call
model.base_modelis a useful feat for models which we didn't support in VLMsThere will be no more weird patter like this after this PR:
Also PR fixes loading base model in Llava-like models which was accidentally deleted. We still need a conversion mapping for legacy checkpoints