-
Notifications
You must be signed in to change notification settings - Fork 31.9k
[Vision] Support different floating precision inputs from the ImageProcessor
#20453
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Vision] Support different floating precision inputs from the ImageProcessor
#20453
Conversation
ImageProcessor
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. |
|
I think this PR is ready, at least as a PoC! |
sgugger
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not 100% sure how the float precision is used in other frameworks. If we use it only for PyTorch, I would concentrate support in the to method, just making sure it also accepts a torch.device without adding a new argument for the float_precision (something that should be added to this PR in any case).
If it is used in other frameworks, your approach seems right, though for readability I would just do the loop in every branch of the test insterad of creating cast_fun and is_floating.
ydshieh
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am fine for supporting float precision for all frameworks.
Thank you for working on this.
| for key, value in self.items(): | ||
| # sanity check that we check for only tensors | ||
| if is_tensor(value): | ||
| if is_floating(value): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not very comfortable to call these is_tensor and if_floating without checking if value is from target_framework.
For example, how about tensor_type being pt but value is a tf tensor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback! I guess this would not happen since the test is already done on the convert_to_tensors function that is called right before
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with you @younesbelkada . However, this method is added like a public method, so the concern is there (despite I doubt any user will use it). If it is prefixed with _, I won't complain at all :-)
Let @sgugger review and give us his opinion if we should make any effort on such things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh I see now! Yes then makes sense to have it prefixed with _ 💪
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My opinion has been stated above. I don't think any of this is useful as Flax and TensorFlow deal differently with different dtypes, and there should only be a slight adaptation of the to method.
amyeroberts
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for implementing the idea in this PR. In terms of structure looks good - juts a few small comments.
I don't have a very strong opinion of how dtypes should be cast. I was originally for the float_precision flag. However, @sgugger has raised good points about using the .to API and I agree that should be focused on first.
@gante @Rocketknight1 - how useful would this be in TF land?
|
Thanks so much everyone for your comments! it seems that |
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
|
I don't have strong opinion though. So you can follow what @sgugger suggests. If we find it's useful for other frameworks, we can add them back. |
|
Thanks everyone! |
I don't think our TF models are compatible with half-precision, right @Rocketknight1? At least I haven't used TF with half-precision :D |
|
Extremely late reply on the TF front, but yeah, we aren't really running TF models in half precision right now. We do support mixed precision (similar to Torch AMP), but we don't officially support splatting the whole model to (b)float16 yet. |
|
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
What does this PR do?
This PR introduces the input casting mechanism for image processors. Since the introduction of
acceleratesupported models for Vision, I have been playing around with half-precision models. I found it a bit inintuitive to manually cast thepixel_valuesoutside theImageProcessorclass. Therefore for some models, small hacks have been introduced to make the casting operation more user-friendly.With this PR, it will be possible to cast the input tensors to any floating point precision, for any framework, at the
ImageProcessorlevel as follows:The casting discards non-floating point tensors, therefore these tensors should not be affected by the casting mechanism (thinking for eg for
ViLTthat takes both text + image)With this PR, the hacks introduced on ViT and OWLViT will be removed!
cc @amyeroberts @ydshieh