clean up vision/text config dict arguments#19954
Conversation
|
The documentation is not available anymore as the PR was closed or merged. |
|
Without this PR, we have somehow surprising/confusing results from transformers import CLIPConfig, CLIPModel
config = CLIPConfig.from_pretrained("openai/clip-vit-base-patch16")
print(config.vision_config.patch_size)
print(config.vision_config_dict["patch_size"])
config.vision_config.patch_size = 32
config.save_pretrained("v2")
config_v2 = CLIPConfig.from_pretrained("v2")
# This is not `32` which is unexpected!
# In fact, it is `vision_config_dict` is being used during loading to set `vision_config`
print(config_v2.vision_config.patch_size)
# This is 32 - unexpected!
print(config_v2.vision_config_dict["patch_size"])
config.vision_config_dict["patch_size"] = 32
config.save_pretrained("v3")
config_v3 = CLIPConfig.from_pretrained("v3")
# This is 32 - unexpected!
print(config_v3.vision_config.patch_size)
# This is 32 - OK
print(config_v3.vision_config_dict["patch_size"]) |
| super().__init__(text_config_dict=text_config_dict, vision_config_dict=vision_config_dict, **kwargs) | ||
| super().__init__(**kwargs) | ||
|
|
||
| # If `_config_dict` exist, we use them for the backward compatibility. |
There was a problem hiding this comment.
For backward compatibility
|
@sgugger If you are happy with the current change, I will apply the changes to some other models, and the testing files. output["text_config"] = self.text_config.to_dict()
output["vision_config"] = self.vision_config.to_dict() |
| **kwargs | ||
| ): | ||
| super().__init__(text_config=text_config, vision_config=vision_config, **kwargs) | ||
| super().__init__(**kwargs) |
There was a problem hiding this comment.
We don't need to pass text/vision config to super, as we will set self.text_config and self.vision_config below
sgugger
left a comment
There was a problem hiding this comment.
LGTM, but pinging @patrickvonplaten and @patil-suraj here too as it may have implications in Diffusers.
|
Awesome that you are working on fixing this! Encountered the same issue with a new model I'm working on called CLIPSeg. Also, could we update |
3b33586 to
a983507
Compare
* clean up * For backward compatibility * clean up * Same changes for more models Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
What does this PR do?
Remove
vision_config_dictandtext_config_dict: just usevision_configandtext_config.