Skip to content

Conversation

@alaradirik
Copy link
Contributor

@alaradirik alaradirik commented Dec 16, 2022

What does this PR do?

Adds Mask2Former to transformers.
Original repo: https://github.com/facebookresearch/Mask2Former/
Paper: https://arxiv.org/abs/2112.01527

Co-authored with @shivalikasingh95

To Do:

  • Fix model tests (hidden state shapes, loading the config)
  • Test model, visualize outputs
  • Update model cards

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • [X ] Did you read the contributor guideline,
    Pull Request section?
  • [X ] Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • [X ] Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

shivalikasingh95 and others added 30 commits August 16, 2022 14:45
Co-authored-by: Alara Dirik <8944735+alaradirik@users.noreply.github.com>
Co-authored-by: Alara Dirik <8944735+alaradirik@users.noreply.github.com>
Co-authored-by: Alara Dirik <8944735+alaradirik@users.noreply.github.com>
Co-authored-by: Alara Dirik <8944735+alaradirik@users.noreply.github.com>
Co-authored-by: Alara Dirik <8944735+alaradirik@users.noreply.github.com>
Co-authored-by: Alara Dirik <8944735+alaradirik@users.noreply.github.com>
Co-authored-by: Alara Dirik <8944735+alaradirik@users.noreply.github.com>
Co-authored-by: Alara Dirik <8944735+alaradirik@users.noreply.github.com>
Co-authored-by: Alara Dirik <8944735+alaradirik@users.noreply.github.com>
Co-authored-by: Alara Dirik <8944735+alaradirik@users.noreply.github.com>
2Added Deformable Detr Encoder classes from deformable_detr Implementation for pixel Decoder
Fixed Pixel Decoder Implementation
Co-authored-by: Alara Dirik <8944735+alaradirik@users.noreply.github.com>
2. Added checkpoint conversion script for mask2former
3. Updated feature extractor for instance segmentation post processing
4. Doc string updates
5. config file fixes
Copy link
Contributor

@NielsRogge NielsRogge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for working on this model! 🙏 Left some final comments.

@NielsRogge
Copy link
Contributor

NielsRogge commented Jan 4, 2023

So it seems there are 2 todo's left:

  • leverage AutoImageProcessor instead of adding a new one
  • make sure slow integration tests of Donut and Swin are still passing, possibly using MaskFormerSwin as backbone

@NielsRogge NielsRogge changed the title Add Mask2Former to transformers Add Mask2Former Jan 4, 2023
@shivalikasingh95
Copy link
Contributor

shivalikasingh95 commented Jan 4, 2023

So it seems there are 2 todo's left:

  • leverage AutoImageProcessor instead of adding a new one
  • make sure slow integration tests of Donut and Swin are still passing, possibly using MaskFormerSwin as backbone

Sure I'll connect with @alaradirik and we'll fix these shortly and update you.

@shivalikasingh95
Copy link
Contributor

@NielsRogge Just wanted to update that backbone for Mask2Former has been switched to MaskFormerSwin.
Changes to modeling_swin.py and modeling_donut_swin.py have been reverted so slow integration tests of Donut and Swin are passing now.

Conversion of all 30 checkpoints from Mask2Former model zoo using swin backbone corresponding to all 4 datasets and segmentation tasks is done and are available on the Hub. I just need to update the model cards. Will finish that shortly too.

@NielsRogge
Copy link
Contributor

NielsRogge commented Jan 5, 2023

Thank you!

I'm just wondering why the issue was occurring only on Swin-base on one specific dataset. It would definitely be nice to clear that up, does it have to do with the image resolution?

For instance for UperNet (at #20648) I was able to perfectly convert all checkpoints that leverage Swin-base by using our SwinBackbone. This one was ported from the mmsegmentation library whose Swin implementation is here. So it's a bit strange. Might it be that we were just "lucky" with UperNet and OneFormer?

@alaradirik alaradirik merged commit 2411f0e into huggingface:main Jan 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants