-
Notifications
You must be signed in to change notification settings - Fork 31.9k
Remove cuda dependency from Deformable-DETR #20993
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove cuda dependency from Deformable-DETR #20993
Conversation
|
The documentation is not available anymore as the PR was closed or merged. |
sgugger
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, very nice to make the code simpler like this!
src/transformers/models/deformable_detr/modeling_deformable_detr.py
Outdated
Show resolved
Hide resolved
| # CPU | ||
| output = ms_deform_attn_core_pytorch(value, spatial_shapes, sampling_locations, attention_weights) | ||
|
|
||
| output = ms_deform_attn_core_pytorch(value, spatial_shapes, sampling_locations, attention_weights) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we're just switching to the CPU implementation here?
The Deformable DETR authors recommended to only use this for debugging:
The cuda version is much faster and more memory efficient than this pytorch version.
See thread: fundamentalvision/Deformable-DETR#9
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I initially thought it was a new function that was vectorized, not the CPU implementation. Let's not remove it indeed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case, can't I use the CUDA version for Mask2Former as well? In fact even OneFormer could benefit from it if it's okay to use the CUDA implementation of MultiScaleDeformableAttention.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc'ing @praeclarumjj3 (OneFormer author) here.
It would probably be best to provide the custom CUDA kernels as an option, rather than by default. This way, we can run the model in Google Colab, which is not the case at the moment.
We did the same for YOSO. This model has a boolean attribute in the config to determine whether or not to use the custom kernel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@NielsRogge Agreed, it makes sense to keep it as optional. I can take up this change today for Mask2Former if you and @alaradirik think it makes sense to go ahead and add support for the CUDA implementation too. Let me know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not the cpu implementation, it can still be run on GPU. The PR simply removes the custom CUDA kernels, which parallelizes some of the GPU operations but also prevent users with an incompatible cuda library version to run the model on GPU.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With that said, adding a boolean attribute to use the custom kernel is a good idea but users would need to first download the configuration of the checkpoint, edit it and then initialize the model if it's not going to be the default behaviour.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm normally if you just do:
from transformers import DeformableDetrForObjectDetection
model = DeformableDetrForObjectDetection.from_pretrained("SenseTime/deformable-detr", use_custom_kernel=False)
then it should work. By providing additional kwargs to the from_pretrained method, you can edit the configuration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And @shivalikasingh95 sure but let's add support for the custom kernel in a separate PR for Mask2Former and OneFormer. We can set it to False by default, and add a boolean attribute for those models.
For Deformable DETR, we could change the behaviour to set it to False by default and inform users that if they really want to use the kernel, set the boolean attribute to True.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alaradirik The function will be a lot slower on GPU since it's not parallelized, so enabling the custom CUDA kernel when possible would be better.
I don't think a flag is needed in the config. Let's just not fail if trying to load the CUDA kernel doesn't work and default to this non-vectorized implementation.
| use_custom_kernel (`bool`, *optional*, defaults to `False`): | ||
| Whether to use custom CUDA kernel to speed up inference and training on GPU. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As said in the thread, let's not use a config parameter for this but just try to use the fast version and default to the slow one if the fast one is failing. If a user sets it to True and then pushes their model, anyone not having a GPU or a PyTorch installed without CUDA will get a failure when using this model.
Config parameters are usually bad for flags that depend on hardware/setups.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I don't think there needs to be any changes to the main branch then. Should I go ahead and close the PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's all good on main, then yes :-)
What does this PR do?
Removes the CUDA dependency from Deformable DETR, OneFormer and Mask2Former PRs also use the same multi-scale deformable attention function and eliminate the CUDA dependency.
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.