Skip to content

Conversation

@praeclarumjj3
Copy link
Contributor

What does this PR do?

Adds the Code, Documentation, and Tests for OneFormer proposed in OneFormer: One Transformer to Rule Universal Image Segmentation. I have also opened a PR to add the documentation images to huggingface/documentation-images.

I have also made changes to the ImageSegmentationPipeline to accommodate OneFormer.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@patrickvonplaten @NielsRogge

@praeclarumjj3 praeclarumjj3 changed the title Add OneFormer Model [WIP] Add OneFormer Model Dec 4, 2022
@praeclarumjj3 praeclarumjj3 force-pushed the oneformer branch 4 times, most recently from 67aa7cd to aa3ca0c Compare December 4, 2022 21:04
@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Dec 4, 2022

The documentation is not available anymore as the PR was closed or merged.

@praeclarumjj3 praeclarumjj3 marked this pull request as draft December 4, 2022 22:12
@praeclarumjj3 praeclarumjj3 force-pushed the oneformer branch 4 times, most recently from a42bb67 to 8f087ca Compare December 4, 2022 23:40
@praeclarumjj3 praeclarumjj3 marked this pull request as ready for review December 5, 2022 05:21
@NielsRogge
Copy link
Contributor

Hi @praeclarumjj3, thanks a lot for your PR. It's awesome OneFormer will be available in the library (we already have MaskFormer and plan to add Mask2Former as well).

I've got 2 main points for now:

Backbones

However, there's no need to implement backbones from scratch again, as we've just added the AutoBackbone class, which allows to use frameworks like DETR, Mask R-CNN, and also OneFormer with all vision backbones available in the library. The idea is to add an xxxBackbone class to each vision model, see for instance here for ResNet.

Next, the framework (like OneFormer) can use the AutoBackbone class as shown here for MaskFormer. This allows to mix-and-match backbones with a given framework.

The plan is to next add SwinBackbone, ConvNextBackbone, as well as NatBackbone and DinatBackbone => which will make sure OneFormer can use them.

Auto class

I doubt there's a need for an AutoModelForUniversalSegmentation class, as OneFormer is probably the only class which will ever be supported by it. It'd be great to make OneFormer work with our existing image segmentation pipeline (cc @Narsil). This pipeline supports instance, semantic and panoptic segmentation, and uses the appropriate postprocess method.

Will soon do a more in depth review! Thanks already for all your work.

@NielsRogge NielsRogge requested a review from alaradirik December 5, 2022 10:57
Copy link
Contributor

@alaradirik alaradirik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@praeclarumjj3 thank you for working on this! Seems like you already made very good progress, my main comments are:

  • As Niels suggested, you can create and/or leverage the XXXBackbone classes. The SwinBackbone PR will be merged shortly so you can just focus on the DinatBackbone class.
  • The current code is CUDA dependent (correct me if I'm wrong). I took a look at the paper and the Pixel Decoder seems very similar to that of Mask2Former (also uses multi-scale deformable attention). Perhaps you could use their PyTorch implementation to get rid of the CUDA scripts, here is the relevant Mask2Former code.
  • I think having a OneFormerForUniversalSegmentation class makes sense but we can add it to auto mapping for instance segmentation instead of creating a new mapping for simplicity.

I will do a detailed review once the custom CUDA scripts are cleaned up.

Thanks again :)

@praeclarumjj3
Copy link
Contributor Author

@praateekmahajan thank you for working on this! Seems like you already made very good progress, my main comments are:

  • As Niels suggested, you can create and/or leverage the XXXBackbone classes. The SwinBackbone PR will be merged shortly so you can just focus on the DinatBackbone class.
  • The current code is CUDA dependent (correct me if I'm wrong). I took a look at the paper and the Pixel Decoder seems very similar to that of Mask2Former (also uses multi-scale deformable attention). Perhaps you could use their PyTorch implementation to get rid of the CUDA scripts, here is the relevant Mask2Former code.
  • I think having a OneFormerForUniversalSegmentation class makes sense but we can add it to auto mapping for instance segmentation instead of creating a new mapping for simplicity.

I will do a detailed review once the custom CUDA scripts are cleaned up.

Thanks again :)

Thanks for the suggestions @alaradirik! I will work on using AutoBackbone classes everywhere. About the CUDA code, sure, the PyTorch code is already there, we just check for the presence of GPU. I will clean the CUDA files. Also, I believe you tagged the wrong person by mistake 😂.

I think having a OneFormerForUniversalSegmentation class makes sense but we can add it to auto mapping for instance segmentation instead of creating a new mapping for simplicity.

I still think it's better to create a different AutoMapping class for OneFormer as it belongs to a whole new class of architecture which uses a single model for all three tasks. Is it possible for us to keep it? Hopefully, there will be follow-up works in the same direction as OneFormer's approach of training a single model.

@alaradirik
Copy link
Contributor

Thanks for the suggestions @alaradirik! I will work on using AutoBackbone classes everywhere. About the CUDA code, sure, the PyTorch code is already there,

Great, that makes things much easier then, and sorry about tagging the wrong person :)

I still think it's better to create a different AutoMapping class for OneFormer as it belongs to a whole new class of architecture which uses a single model for all three tasks. Is it possible for us to keep it? Hopefully, there will be follow-up works in the same direction as OneFormer's approach of training a single model.

MaskFormer and Mask2Former (in progress in another PR) also feature universal segmentation architectures and I agree that new research will likely leverage the same paradigm. In retrospect, creating an auto mapping for universal segmentation and adding MaskFormer and Mask2Former along with OneFormer might be better. @NielsRogge what do you think about this?

@praeclarumjj3 praeclarumjj3 force-pushed the oneformer branch 3 times, most recently from a2c4071 to 8bbaade Compare December 7, 2022 23:55
@NielsRogge
Copy link
Contributor

Thanks for all your work! Merging now.

@NielsRogge NielsRogge merged commit 5b94962 into huggingface:main Jan 19, 2023
@ydshieh
Copy link
Collaborator

ydshieh commented Jan 20, 2023

Hi @praeclarumjj3 Thank you for adding this model!

There are a few examples in the docstrings failing the CI. For example, in OneFormerForUniversalSegmentation.forward

        >>> # you can pass them to feature_extractor for instance postprocessing
        >>> predicted_instance_map = feature_extractor.post_process_instance_segmentation(

the feature_extractor is not defined.

Would you like to make them fixed 🙏 ? If so, you can run the doctest like

(if you have some change in the branch, stage them first)

python3 utils/prepare_for_doc_test.py src docs

then

python3 -m pytest -v --make-reports doc_tests_gpu --doctest-modules src/transformers/models/gptj/modeling_gptj.py::transformers.models.gptj.modeling_gptj.GPTJForSequenceClassification.forward -sv --doctest-continue-on-failure --doctest-glob="*.mdx"

and also

python3 -m pytest -v --make-reports doc_tests_gpu --doctest-modules src/transformers/models/oneformer/modeling_oneformer.py::transformers.models.oneformer.modeling_oneformer.OneFormerModel.forward -sv --doctest-continue-on-failure --doctest-glob="*.mdx"

After running the doctests, discard the change produced by prepare_for_doc_test.py, and see if you need more changes in the branch.

Don't hesitate if you have further question, or if you could not find time on this at this moment (our team will fix it then) 🙏 Thank you

@praeclarumjj3
Copy link
Contributor Author

praeclarumjj3 commented Jan 20, 2023

Hi @ydshieh, thanks for pointing this out to me. I apologize for not fixing the docstrings in the original PR (missed the changes after changing the code in an older commit). I have opened a new PR with the inconsistencies fixed: #21215.

Please take a look and let me know if something's still broken. And thanks for letting me know about the doctests! ✌🏻

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants