Deprecate dtype per sub config#42990
Conversation
|
run-slow: llava, llava_next, llava_next_video, llava_onevision |
|
This comment contains models: ["models/llava", "models/llava_next", "models/llava_next_video", "models/llava_onevision"] |
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
CI Results✅ No failing test specific to this PR 🎉 ! |
Co-authored-by: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
| logger.warning_once( | ||
| f"The dtype for {sub_config_key}={sub_dtype} is different from the main dtype={main_dtype}. " | ||
| "Setting different dtypes per backbone model might cause device errors downstream, setting the " | ||
| f"dtype={main_dtype} for all modules. Note, using different dtypes per module is deprecated and will " | ||
| "be removed in future versions." | ||
| ) |
There was a problem hiding this comment.
It means that in the future it will raise an exception, right?
There was a problem hiding this comment.
yes, after a few released we can switch to an error
There was a problem hiding this comment.
It means that if a user sets dtype="auto", all sub modules will be forcibly cast to the specified main_dtype?
Cyrilvallez
left a comment
There was a problem hiding this comment.
If we do that, then we need to plainly remove dtype being passed as a dict as well!
Cyrilvallez
left a comment
There was a problem hiding this comment.
Alright, if there is not many models with different dtypes in subconfigs, it's indeed better to only show the warning when explicitly passing a dict, and silently skipping otherwise!
* deprecate dtypes per sub config * fix test * Update tests/utils/test_modeling_utils.py Co-authored-by: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com> * address comment --------- Co-authored-by: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
What does this PR do?
Fixes #42968, #42641
As discussed, this PR deprecated different dtypes per backbone and falls back to the main dtype if a user is requesting different dtypes.
There is no clean way to support different dtypes when the backbone can have a task head. We either cast hidden states to task dtype in all model files, or we need a workaround to infer task head and force dtype on it when loading. So the better solution is to deprecate dict dtypes, apparently it is not used much