Skip to content

Conversation

@alaradirik
Copy link
Contributor

@alaradirik alaradirik commented Jan 4, 2023

What does this PR do?

Removes the CUDA dependency from Deformable DETR, OneFormer and Mask2Former PRs also use the same multi-scale deformable attention function and eliminate the CUDA dependency.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • [ X] Did you read the contributor guideline,
    Pull Request section?
  • [ X] Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Jan 4, 2023

The documentation is not available anymore as the PR was closed or merged.

Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, very nice to make the code simpler like this!

# CPU
output = ms_deform_attn_core_pytorch(value, spatial_shapes, sampling_locations, attention_weights)

output = ms_deform_attn_core_pytorch(value, spatial_shapes, sampling_locations, attention_weights)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we're just switching to the CPU implementation here?

The Deformable DETR authors recommended to only use this for debugging:

The cuda version is much faster and more memory efficient than this pytorch version.

See thread: fundamentalvision/Deformable-DETR#9

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I initially thought it was a new function that was vectorized, not the CPU implementation. Let's not remove it indeed.

Copy link
Contributor

@shivalikasingh95 shivalikasingh95 Jan 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, can't I use the CUDA version for Mask2Former as well? In fact even OneFormer could benefit from it if it's okay to use the CUDA implementation of MultiScaleDeformableAttention.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc'ing @praeclarumjj3 (OneFormer author) here.

It would probably be best to provide the custom CUDA kernels as an option, rather than by default. This way, we can run the model in Google Colab, which is not the case at the moment.

We did the same for YOSO. This model has a boolean attribute in the config to determine whether or not to use the custom kernel.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@NielsRogge Agreed, it makes sense to keep it as optional. I can take up this change today for Mask2Former if you and @alaradirik think it makes sense to go ahead and add support for the CUDA implementation too. Let me know.

Copy link
Contributor Author

@alaradirik alaradirik Jan 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not the cpu implementation, it can still be run on GPU. The PR simply removes the custom CUDA kernels, which parallelizes some of the GPU operations but also prevent users with an incompatible cuda library version to run the model on GPU.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With that said, adding a boolean attribute to use the custom kernel is a good idea but users would need to first download the configuration of the checkpoint, edit it and then initialize the model if it's not going to be the default behaviour.

@NielsRogge @sgugger

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm normally if you just do:

from transformers import DeformableDetrForObjectDetection

model = DeformableDetrForObjectDetection.from_pretrained("SenseTime/deformable-detr", use_custom_kernel=False) 

then it should work. By providing additional kwargs to the from_pretrained method, you can edit the configuration.

Copy link
Contributor

@NielsRogge NielsRogge Jan 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And @shivalikasingh95 sure but let's add support for the custom kernel in a separate PR for Mask2Former and OneFormer. We can set it to False by default, and add a boolean attribute for those models.

For Deformable DETR, we could change the behaviour to set it to False by default and inform users that if they really want to use the kernel, set the boolean attribute to True.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alaradirik The function will be a lot slower on GPU since it's not parallelized, so enabling the custom CUDA kernel when possible would be better.

I don't think a flag is needed in the config. Let's just not fail if trying to load the CUDA kernel doesn't work and default to this non-vectorized implementation.

Comment on lines +119 to +120
use_custom_kernel (`bool`, *optional*, defaults to `False`):
Whether to use custom CUDA kernel to speed up inference and training on GPU.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As said in the thread, let's not use a config parameter for this but just try to use the fast version and default to the slow one if the fast one is failing. If a user sets it to True and then pushes their model, anyone not having a GPU or a PyTorch installed without CUDA will get a failure when using this model.

Config parameters are usually bad for flags that depend on hardware/setups.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I don't think there needs to be any changes to the main branch then. Should I go ahead and close the PR?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's all good on main, then yes :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants