Skip to content

Conversation

@NielsRogge
Copy link
Contributor

@NielsRogge NielsRogge commented Nov 14, 2022

What does this PR do?

This PR adds support for more backbones than just Swin for the MaskFormer framework. The MaskFormer authors released checkpoints that leverage either ResNet or Swin as backbones, however we currently only support Swin. To support various backbones, this PR introduces the AutoBackbone API.

It introduces the following improvements:

  • adding AutoBackbone, ResNetBackbone
  • move MaskFormerSwin to its own modeling files and add MaskFormerSwinBackbone
  • make MaskFormer use the AutoBackbone API to leverage any backbone, including ResNet

AutoBackbone API

The API is implemented as follows. For a given model, one should implement an additional class, xxxBackbone, for instance ResNetBackbone, in addition to the regular classes like xxxModel and xxxForImageClassification. The backbone class turns the xxxModel into a generic backbone to be consumed by a framework, like DETR or MaskFormer.

The API is inspired by the one used in Detectron2. This means that any backbone should implement a forward and an output_shape method:

  • the forward method returns the hidden states for each of the desired stages
  • the output_shape method returns the channel dimension + strides for each of the desired stages.

There are additional methods like size_divisibility and padding_constraints which could be added in the future, for now they don't seem necessary.

Usage

An example can be found below. Basically, the user can specify which layers/stages to get the feature maps from.

from transformers import ResNetConfig, ResNetBackbone
import torch

config = ResNetConfig(out_features=["stem", "stage1", "stage2", "stage3", "stage4"])
model = ResNetBackbone(config)

pixel_values = torch.randn(1, 3, 224, 224)

outputs = model(pixel_values)
for key, value in outputs.items():
    print(key, value.shape)

which prints:

stem torch.Size([1, 64, 56, 56])
stage1 torch.Size([1, 256, 56, 56])
stage2 torch.Size([1, 512, 28, 28])
stage3 torch.Size([1, 1024, 14, 14])
stage4 torch.Size([1, 2048, 7, 7])

One can check the output specification as follows:

print(model.output_shape())

which prints:

{'stem': ShapeSpec(channels=64, height=None, width=None, stride=2), 'stage1': ShapeSpec(channels=256, height=None, width=None, stride=4), 'stage2': ShapeSpec(channels=512, height=None, width=None, stride=4), 'stage3': ShapeSpec(channels=1024, height=None, width=None, stride=4), 'stage4': ShapeSpec(channels=2048, height=None, width=None, stride=4)}

This is useful for frameworks, as they oftentimes require to know these things at initialization.

The Backbone API has a corresponding Auto class, which means that the following also works:

from transformers import ResNetConfig, AutoBackbone

config = ResNetConfig(out_features=["stem", "stage1", "stage2", "stage3", "stage4"])
model = AutoBackbone.from_config(config)

The AutoBackbone class also supports loading pre-trained weights, like so:

from transformers import AutoBackbone

backbone = AutoBackbone.from_pretrained("microsoft/resnet-50")

As the backbone also uses the same base_model_prefix like other head models.

To do's

  • Add tests for backbones. Backbone classes should not be tested with all tests defined in test_modeling_common.py, instead they should have separate tests. Here I'd like to discuss the best way to add these tests.
  • make fixup is currently complaining about the following:
Exception: There were 2 failures:
MaskFormerSwinBackbone is defined in 
transformers.models.maskformer.modeling_maskformer_swin but is not present in 
any of the auto mapping. If that is intended behavior, add its name to 
`IGNORE_NON_AUTO_CONFIGURED` in the file `utils/check_repo.py`.
ResNetBackbone is defined in transformers.models.resnet.modeling_resnet but is 
not present in any of the auto mapping. If that is intended behavior, add its 
name to `IGNORE_NON_AUTO_CONFIGURED` in the file `utils/check_repo.py`

=> however I added both MaskFormerSwinBackbone and ResNetBackbone to modeling_auto.py, so not sure why this fails. cc @sgugger

MaskFormer specifics

MaskFormer supports both ResNet and Swin as backbone. It does support native ResNets, but it doesn't use the native Swin as backbone, which is why we have a separate MaskFormerSwinModel class in the library, as well as a MaskFormerSwinBackbone class in this PR.

Happy to discuss the design!

1. **[MarianMT](https://huggingface.co/docs/transformers/model_doc/marian)** Machine translation models trained using [OPUS](http://opus.nlpl.eu/) data by Jörg Tiedemann. The [Marian Framework](https://marian-nmt.github.io/) is being developed by the Microsoft Translator Team.
1. **[MarkupLM](https://huggingface.co/docs/transformers/model_doc/markuplm)** (from Microsoft Research Asia) released with the paper [MarkupLM: Pre-training of Text and Markup Language for Visually-rich Document Understanding](https://arxiv.org/abs/2110.08518) by Junlong Li, Yiheng Xu, Lei Cui, Furu Wei.
1. **[MaskFormer](https://huggingface.co/docs/transformers/model_doc/maskformer)** (from Meta and UIUC) released with the paper [Per-Pixel Classification is Not All You Need for Semantic Segmentation](https://arxiv.org/abs/2107.06278) by Bowen Cheng, Alexander G. Schwing, Alexander Kirillov.
1. **[MaskformerSwin](https://huggingface.co/docs/transformers/main/model_doc/maskformer-swin)** (from <FILL INSTITUTION>) released with the paper [<FILL PAPER TITLE>](<FILL ARKIV LINK>) by <FILL AUTHORS>.
Copy link
Contributor Author

@NielsRogge NielsRogge Nov 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @sgugger this should not be added, MaskFormerSwin is an equivalent case to DonutSwin, however I tried adding ("MaskFormerSwin": "MaskFormer") to utils/check_copies.py, but no luck.

Do you know how to remove this?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since I'm reading MaskformerSwin above (and not MaskFormerSwin) I'm guessing fixing the typo should be enough.

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working in this, but this PR is not reviewable as it is because it tries to add to many things together. Moving the code for maskformer swin outside of the maskformer file is a PR of its own. Adding support for the Resnet backbone is a PR of its own. Finally adding an AutoBackbone API is also a PR of its own.

We are extremely far from being able to have an abstract class for backbones since we are just starting using them, so let's not add one (also Transformers doesn't really do abstract classes anyway). We should jsut focus on having backbone models in a first step with one forward and the minimal number of methods needed to make the rest work.

1. **[MarianMT](https://huggingface.co/docs/transformers/model_doc/marian)** Machine translation models trained using [OPUS](http://opus.nlpl.eu/) data by Jörg Tiedemann. The [Marian Framework](https://marian-nmt.github.io/) is being developed by the Microsoft Translator Team.
1. **[MarkupLM](https://huggingface.co/docs/transformers/model_doc/markuplm)** (from Microsoft Research Asia) released with the paper [MarkupLM: Pre-training of Text and Markup Language for Visually-rich Document Understanding](https://arxiv.org/abs/2110.08518) by Junlong Li, Yiheng Xu, Lei Cui, Furu Wei.
1. **[MaskFormer](https://huggingface.co/docs/transformers/model_doc/maskformer)** (from Meta and UIUC) released with the paper [Per-Pixel Classification is Not All You Need for Semantic Segmentation](https://arxiv.org/abs/2107.06278) by Bowen Cheng, Alexander G. Schwing, Alexander Kirillov.
1. **[MaskformerSwin](https://huggingface.co/docs/transformers/main/model_doc/maskformer-swin)** (from <FILL INSTITUTION>) released with the paper [<FILL PAPER TITLE>](<FILL ARKIV LINK>) by <FILL AUTHORS>.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since I'm reading MaskformerSwin above (and not MaskFormerSwin) I'm guessing fixing the typo should be enough.

@@ -0,0 +1,71 @@
# Copyright (c) Facebook, Inc. and its affiliates.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not our copyright. I am not in favor of using a copy-pasted file for a base utility.

stride: Optional[int] = None


class Backbone(nn.Module):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Transformers library does not use abstract classes. Especially not for a new API we haven't quite figured out yet. So let's just do backbone models for now.

# Model for Instance Segmentation mapping
("maskformer-swin", "MaskFormerSwinBackbone"),
("resnet", "ResNetBackbone"),
("swin", "MaskFormerSwinBackbone"), # for backward compatibility
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not make any sense since we could have a SwinBackbone one of these days.

@NielsRogge
Copy link
Contributor Author

NielsRogge commented Dec 7, 2022

Closing this PR as it has been added in smaller separate PRs.

@NielsRogge NielsRogge closed this Dec 7, 2022
@sgugger
Copy link
Collaborator

sgugger commented Dec 7, 2022

Thanks again for splitting it, it was really better this way!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants