-
Notifications
You must be signed in to change notification settings - Fork 31.9k
[MaskFormer] PoC of AutoBackbone API to support ResNet + Swin #20204
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MaskFormer] PoC of AutoBackbone API to support ResNet + Swin #20204
Conversation
| 1. **[MarianMT](https://huggingface.co/docs/transformers/model_doc/marian)** Machine translation models trained using [OPUS](http://opus.nlpl.eu/) data by Jörg Tiedemann. The [Marian Framework](https://marian-nmt.github.io/) is being developed by the Microsoft Translator Team. | ||
| 1. **[MarkupLM](https://huggingface.co/docs/transformers/model_doc/markuplm)** (from Microsoft Research Asia) released with the paper [MarkupLM: Pre-training of Text and Markup Language for Visually-rich Document Understanding](https://arxiv.org/abs/2110.08518) by Junlong Li, Yiheng Xu, Lei Cui, Furu Wei. | ||
| 1. **[MaskFormer](https://huggingface.co/docs/transformers/model_doc/maskformer)** (from Meta and UIUC) released with the paper [Per-Pixel Classification is Not All You Need for Semantic Segmentation](https://arxiv.org/abs/2107.06278) by Bowen Cheng, Alexander G. Schwing, Alexander Kirillov. | ||
| 1. **[MaskformerSwin](https://huggingface.co/docs/transformers/main/model_doc/maskformer-swin)** (from <FILL INSTITUTION>) released with the paper [<FILL PAPER TITLE>](<FILL ARKIV LINK>) by <FILL AUTHORS>. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @sgugger this should not be added, MaskFormerSwin is an equivalent case to DonutSwin, however I tried adding ("MaskFormerSwin": "MaskFormer") to utils/check_copies.py, but no luck.
Do you know how to remove this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since I'm reading MaskformerSwin above (and not MaskFormerSwin) I'm guessing fixing the typo should be enough.
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. |
sgugger
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working in this, but this PR is not reviewable as it is because it tries to add to many things together. Moving the code for maskformer swin outside of the maskformer file is a PR of its own. Adding support for the Resnet backbone is a PR of its own. Finally adding an AutoBackbone API is also a PR of its own.
We are extremely far from being able to have an abstract class for backbones since we are just starting using them, so let's not add one (also Transformers doesn't really do abstract classes anyway). We should jsut focus on having backbone models in a first step with one forward and the minimal number of methods needed to make the rest work.
| 1. **[MarianMT](https://huggingface.co/docs/transformers/model_doc/marian)** Machine translation models trained using [OPUS](http://opus.nlpl.eu/) data by Jörg Tiedemann. The [Marian Framework](https://marian-nmt.github.io/) is being developed by the Microsoft Translator Team. | ||
| 1. **[MarkupLM](https://huggingface.co/docs/transformers/model_doc/markuplm)** (from Microsoft Research Asia) released with the paper [MarkupLM: Pre-training of Text and Markup Language for Visually-rich Document Understanding](https://arxiv.org/abs/2110.08518) by Junlong Li, Yiheng Xu, Lei Cui, Furu Wei. | ||
| 1. **[MaskFormer](https://huggingface.co/docs/transformers/model_doc/maskformer)** (from Meta and UIUC) released with the paper [Per-Pixel Classification is Not All You Need for Semantic Segmentation](https://arxiv.org/abs/2107.06278) by Bowen Cheng, Alexander G. Schwing, Alexander Kirillov. | ||
| 1. **[MaskformerSwin](https://huggingface.co/docs/transformers/main/model_doc/maskformer-swin)** (from <FILL INSTITUTION>) released with the paper [<FILL PAPER TITLE>](<FILL ARKIV LINK>) by <FILL AUTHORS>. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since I'm reading MaskformerSwin above (and not MaskFormerSwin) I'm guessing fixing the typo should be enough.
| @@ -0,0 +1,71 @@ | |||
| # Copyright (c) Facebook, Inc. and its affiliates. | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not our copyright. I am not in favor of using a copy-pasted file for a base utility.
| stride: Optional[int] = None | ||
|
|
||
|
|
||
| class Backbone(nn.Module): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Transformers library does not use abstract classes. Especially not for a new API we haven't quite figured out yet. So let's just do backbone models for now.
| # Model for Instance Segmentation mapping | ||
| ("maskformer-swin", "MaskFormerSwinBackbone"), | ||
| ("resnet", "ResNetBackbone"), | ||
| ("swin", "MaskFormerSwinBackbone"), # for backward compatibility |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not make any sense since we could have a SwinBackbone one of these days.
|
Closing this PR as it has been added in smaller separate PRs. |
|
Thanks again for splitting it, it was really better this way! |
What does this PR do?
This PR adds support for more backbones than just Swin for the MaskFormer framework. The MaskFormer authors released checkpoints that leverage either ResNet or Swin as backbones, however we currently only support Swin. To support various backbones, this PR introduces the AutoBackbone API.
It introduces the following improvements:
AutoBackbone API
The API is implemented as follows. For a given model, one should implement an additional class,
xxxBackbone, for instanceResNetBackbone, in addition to the regular classes likexxxModelandxxxForImageClassification. The backbone class turns thexxxModelinto a generic backbone to be consumed by a framework, like DETR or MaskFormer.The API is inspired by the one used in Detectron2. This means that any backbone should implement a
forwardand anoutput_shapemethod:forwardmethod returns the hidden states for each of the desired stagesoutput_shapemethod returns the channel dimension + strides for each of the desired stages.There are additional methods like
size_divisibilityandpadding_constraintswhich could be added in the future, for now they don't seem necessary.Usage
An example can be found below. Basically, the user can specify which layers/stages to get the feature maps from.
which prints:
One can check the output specification as follows:
which prints:
This is useful for frameworks, as they oftentimes require to know these things at initialization.
The Backbone API has a corresponding Auto class, which means that the following also works:
The AutoBackbone class also supports loading pre-trained weights, like so:
As the backbone also uses the same
base_model_prefixlike other head models.To do's
test_modeling_common.py, instead they should have separate tests. Here I'd like to discuss the best way to add these tests.=> however I added both MaskFormerSwinBackbone and ResNetBackbone to modeling_auto.py, so not sure why this fails. cc @sgugger
MaskFormer specifics
MaskFormer supports both ResNet and Swin as backbone. It does support native ResNets, but it doesn't use the native Swin as backbone, which is why we have a separate
MaskFormerSwinModelclass in the library, as well as aMaskFormerSwinBackboneclass in this PR.Happy to discuss the design!