Add OneFormer Model #20577

praeclarumjj3 · 2022-12-04T08:09:46Z

What does this PR do?

Adds the Code, Documentation, and Tests for OneFormer proposed in OneFormer: One Transformer to Rule Universal Image Segmentation. I have also opened a PR to add the documentation images to huggingface/documentation-images.

I have also made changes to the ImageSegmentationPipeline to accommodate OneFormer.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@patrickvonplaten @NielsRogge

HuggingFaceDocBuilderDev · 2022-12-04T21:19:37Z

The documentation is not available anymore as the PR was closed or merged.

NielsRogge · 2022-12-05T08:57:30Z

Hi @praeclarumjj3, thanks a lot for your PR. It's awesome OneFormer will be available in the library (we already have MaskFormer and plan to add Mask2Former as well).

I've got 2 main points for now:

Backbones

However, there's no need to implement backbones from scratch again, as we've just added the AutoBackbone class, which allows to use frameworks like DETR, Mask R-CNN, and also OneFormer with all vision backbones available in the library. The idea is to add an xxxBackbone class to each vision model, see for instance here for ResNet.

Next, the framework (like OneFormer) can use the AutoBackbone class as shown here for MaskFormer. This allows to mix-and-match backbones with a given framework.

The plan is to next add SwinBackbone, ConvNextBackbone, as well as NatBackbone and DinatBackbone => which will make sure OneFormer can use them.

Auto class

I doubt there's a need for an AutoModelForUniversalSegmentation class, as OneFormer is probably the only class which will ever be supported by it. It'd be great to make OneFormer work with our existing image segmentation pipeline (cc @Narsil). This pipeline supports instance, semantic and panoptic segmentation, and uses the appropriate postprocess method.

Will soon do a more in depth review! Thanks already for all your work.

src/transformers/__init__.py

src/transformers/models/auto/modeling_auto.py

src/transformers/models/oneformer/backbone_dinat_oneformer.py

src/transformers/models/oneformer/backbone_swin_oneformer.py

docs/source/en/model_doc/oneformer.mdx

src/transformers/models/oneformer/dataset_info_oneformer.py

alaradirik

@praeclarumjj3 thank you for working on this! Seems like you already made very good progress, my main comments are:

As Niels suggested, you can create and/or leverage the XXXBackbone classes. The SwinBackbone PR will be merged shortly so you can just focus on the DinatBackbone class.
The current code is CUDA dependent (correct me if I'm wrong). I took a look at the paper and the Pixel Decoder seems very similar to that of Mask2Former (also uses multi-scale deformable attention). Perhaps you could use their PyTorch implementation to get rid of the CUDA scripts, here is the relevant Mask2Former code.
I think having a OneFormerForUniversalSegmentation class makes sense but we can add it to auto mapping for instance segmentation instead of creating a new mapping for simplicity.

I will do a detailed review once the custom CUDA scripts are cleaned up.

Thanks again :)

setup.py

src/transformers/pipelines/image_segmentation.py

src/transformers/utils/dummy_vision_objects.py

src/transformers/models/oneformer/backbone_dinat_oneformer.py

src/transformers/models/oneformer/dataset_info_oneformer.py

praeclarumjj3 · 2022-12-07T04:42:46Z

@praateekmahajan thank you for working on this! Seems like you already made very good progress, my main comments are:

As Niels suggested, you can create and/or leverage the XXXBackbone classes. The SwinBackbone PR will be merged shortly so you can just focus on the DinatBackbone class.

The current code is CUDA dependent (correct me if I'm wrong). I took a look at the paper and the Pixel Decoder seems very similar to that of Mask2Former (also uses multi-scale deformable attention). Perhaps you could use their PyTorch implementation to get rid of the CUDA scripts, here is the relevant Mask2Former code.

I think having a OneFormerForUniversalSegmentation class makes sense but we can add it to auto mapping for instance segmentation instead of creating a new mapping for simplicity.

I will do a detailed review once the custom CUDA scripts are cleaned up.

Thanks again :)

Thanks for the suggestions @alaradirik! I will work on using AutoBackbone classes everywhere. About the CUDA code, sure, the PyTorch code is already there, we just check for the presence of GPU. I will clean the CUDA files. Also, I believe you tagged the wrong person by mistake 😂.

I think having a OneFormerForUniversalSegmentation class makes sense but we can add it to auto mapping for instance segmentation instead of creating a new mapping for simplicity.

I still think it's better to create a different AutoMapping class for OneFormer as it belongs to a whole new class of architecture which uses a single model for all three tasks. Is it possible for us to keep it? Hopefully, there will be follow-up works in the same direction as OneFormer's approach of training a single model.

alaradirik · 2022-12-07T06:43:48Z

Thanks for the suggestions @alaradirik! I will work on using AutoBackbone classes everywhere. About the CUDA code, sure, the PyTorch code is already there,

Great, that makes things much easier then, and sorry about tagging the wrong person :)

I still think it's better to create a different AutoMapping class for OneFormer as it belongs to a whole new class of architecture which uses a single model for all three tasks. Is it possible for us to keep it? Hopefully, there will be follow-up works in the same direction as OneFormer's approach of training a single model.

MaskFormer and Mask2Former (in progress in another PR) also feature universal segmentation architectures and I agree that new research will likely leverage the same paradigm. In retrospect, creating an auto mapping for universal segmentation and adding MaskFormer and Mask2Former along with OneFormer might be better. @NielsRogge what do you think about this?

src/transformers/models/oneformer/processing_oneformer.py

src/transformers/models/oneformer/modeling_oneformer.py

NielsRogge · 2023-01-19T08:31:01Z

Thanks for all your work! Merging now.

ydshieh · 2023-01-20T15:40:00Z

Hi @praeclarumjj3 Thank you for adding this model!

There are a few examples in the docstrings failing the CI. For example, in OneFormerForUniversalSegmentation.forward

        >>> # you can pass them to feature_extractor for instance postprocessing
        >>> predicted_instance_map = feature_extractor.post_process_instance_segmentation(

the feature_extractor is not defined.

Would you like to make them fixed 🙏 ? If so, you can run the doctest like

(if you have some change in the branch, stage them first)

python3 utils/prepare_for_doc_test.py src docs

then

python3 -m pytest -v --make-reports doc_tests_gpu --doctest-modules src/transformers/models/gptj/modeling_gptj.py::transformers.models.gptj.modeling_gptj.GPTJForSequenceClassification.forward -sv --doctest-continue-on-failure --doctest-glob="*.mdx"

and also

python3 -m pytest -v --make-reports doc_tests_gpu --doctest-modules src/transformers/models/oneformer/modeling_oneformer.py::transformers.models.oneformer.modeling_oneformer.OneFormerModel.forward -sv --doctest-continue-on-failure --doctest-glob="*.mdx"

After running the doctests, discard the change produced by prepare_for_doc_test.py, and see if you need more changes in the branch.

Don't hesitate if you have further question, or if you could not find time on this at this moment (our team will fix it then) 🙏 Thank you

praeclarumjj3 · 2023-01-20T16:17:31Z

Hi @ydshieh, thanks for pointing this out to me. I apologize for not fixing the docstrings in the original PR (missed the changes after changing the code in an older commit). I have opened a new PR with the inconsistencies fixed: #21215.

Please take a look and let me know if something's still broken. And thanks for letting me know about the doctests! ✌🏻

praeclarumjj3 changed the title ~~Add OneFormer Model~~ [WIP] Add OneFormer Model Dec 4, 2022

praeclarumjj3 force-pushed the oneformer branch 4 times, most recently from 67aa7cd to aa3ca0c Compare December 4, 2022 21:04

praeclarumjj3 force-pushed the oneformer branch from aa3ca0c to 3b06567 Compare December 4, 2022 21:26

praeclarumjj3 marked this pull request as draft December 4, 2022 22:12

praeclarumjj3 force-pushed the oneformer branch 4 times, most recently from a42bb67 to 8f087ca Compare December 4, 2022 23:40

praeclarumjj3 marked this pull request as ready for review December 5, 2022 05:21