Skip to content

Conversation

@NielsRogge
Copy link
Contributor

@NielsRogge NielsRogge commented Dec 7, 2022

What does this PR do?

This PR adds the classic UperNet framework to Transformers.

Many papers that introduce a new vision backbone, such as BEiT, ConvNeXt, Swin,... benchmark their model on downstream tasks such as semantic segmentation and object detection. All of these papers use the UperNet framework (introduced in 2018) when evaluating their backbone on semantic segmentation.

Hence, this PR implements this framework, making use of the new AutoBackbone API to make the following possible:

from transformers import SwinConfig, UperNetConfig, UperNetForSemanticSegmentation

backbone_config = SwinConfig(out_features=["stage1", "stage2", "stage3", "stage4"])

config = UperNetConfig(backbone_config=backbone_config)
model = UperNetForSemanticSegmentation(config)

In the code above, we're instantiating the UperNet framework with Swin Transformer as backbone. The code looks equivalent for another backbone, like ConvNeXt:

from transformers import ConvNextBackbone, UperNetConfig, UperNetForSemanticSegmentation

backbone_config = ConvNextBackbone(out_features=["stage1", "stage2", "stage3", "stage4"])

config = UperNetConfig(backbone_config=backbone_config)
model = UperNetForSemanticSegmentation(config)

To do:

  • looking into supporting from_pretrained of backbones => will be done in a follow-up PR
  • make sure UperNetImageProcessor does exact same preprocessing
  • make UperNetImageProcessor also take segmentation_maps as optional input
  • add image processor tests
  • convert all checkpoints + update organization
  • fix integration tests

@NielsRogge NielsRogge force-pushed the add_upernet_swin_encoder branch from 41fe6b6 to f899eea Compare December 7, 2022 12:54
@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Dec 7, 2022

The documentation is not available anymore as the PR was closed or merged.

@NielsRogge NielsRogge force-pushed the add_upernet_swin_encoder branch from 7e880ea to 11db99f Compare December 9, 2022 08:49
@NielsRogge NielsRogge mentioned this pull request Dec 9, 2022
5 tasks
@NielsRogge NielsRogge force-pushed the add_upernet_swin_encoder branch from 70736a5 to 963dc11 Compare December 14, 2022 10:17
@NielsRogge NielsRogge marked this pull request as ready for review December 14, 2022 15:48
Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this new model. Left a couple of comments.

@NielsRogge NielsRogge mentioned this pull request Dec 14, 2022
@NielsRogge NielsRogge force-pushed the add_upernet_swin_encoder branch from d657041 to f415981 Compare December 15, 2022 21:35
@NielsRogge NielsRogge requested a review from sgugger December 16, 2022 08:23
@NielsRogge NielsRogge mentioned this pull request Dec 16, 2022
5 tasks
)
if isinstance(backbone_config, dict):
config_class = CONFIG_MAPPING[backbone_model_type]
backbone_config = config_class.from_dict(backbone_config)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe raise an error if then the type is not a PretrainedConfig?

Copy link
Contributor

@alaradirik alaradirik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this!

Left a few minor comments but everything looks good apart from the issues/comments related to configuration and model parameter initialisation (+ organization name update) and works as expected.

Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM on the model initialization. One last comment on the image processor: it needs to be added in the auto mapping, and since it seems to be a full copy of the SegformerImageProcessor, you should re-use it in the auto-mapping and not introduce a new image processor here.

Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all your work on this model!

@NielsRogge
Copy link
Contributor Author

Thanks for the review, I'm waiting for the authors to respond regarding the creation of an organization on the hub.

@NielsRogge NielsRogge force-pushed the add_upernet_swin_encoder branch from 586eebc to 4369af8 Compare January 13, 2023 12:45
@NielsRogge NielsRogge merged commit 4ed89d4 into huggingface:main Jan 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants