Skip to content

Conversation

@abhiwand
Copy link
Contributor

@abhiwand abhiwand commented Dec 15, 2022

What does this PR do?

This PR implements a HuggingFace Transformers version of BridgeTower: Building Bridges Between Encoders in Vision-Language Representation Learning from the paper https://arxiv.org/abs/2206.08657.pdf

This paper has been accepted to https://aaai.org/Conferences/AAAI-23/

The model's pre-trained checkpoints and configurations have been released here:
https://huggingface.co/BridgeTower under:

The following heads have been implemented:

  • BridgeTowerForMaskedLM
  • BridgeTowerForImageAndTextRetrieval

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

@amyeroberts @NielsRogge @ArthurZucker could you please assist with review and feedback.

@philschmid

abhiwand and others added 30 commits November 23, 2022 12:42
Copy link
Contributor

@NielsRogge NielsRogge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for working on this and addressing all comments!

There are still 2 comments which seem to be unaddressed, after that good for me to merge.

@tileintel
Copy link
Contributor

@NielsRogge Our PR keeps failing at tests/pipelines/test_pipelines_automatic_speech_recognition.py::AutomaticSpeechRecognitionPipelineTests::test_return_timestamps_in_preprocess. Would you please help to see if it is because of BridgeTower or because of something else?
Thanks a lot

@amyeroberts
Copy link
Contributor

@abhiwand @tileintel Thanks for address all of the comments! On Monday there were two PRs merged into main which added test_image_processing_common.py (#20785) and updated the feature extractor references in the test_image_processing_xxx.py files (#20768). Could you update test_image_processing_bridgetower.py to reflect these please?

@tileintel
Copy link
Contributor

@amyeroberts We have updated test_image_processing_bridgetower.py as you suggested. Thanks for the suggestion.
@NielsRogge @amyeroberts @sgugger We have addressed all of the comments. Thanks a lot for helping us to review and approve this. We are very looking forward to having this PR merged into main soon.

@sgugger sgugger merged commit 3a6e4a2 into huggingface:main Jan 25, 2023
@sgugger
Copy link
Collaborator

sgugger commented Jan 25, 2023

Thanks again for your contribution!

@tileintel
Copy link
Contributor

tileintel commented Jan 25, 2023

@sgugger Thank you for merging this PR. May I ask when BridgeTower model will go to HuggingFace's production and what release is that?
Thanks

@sgugger
Copy link
Collaborator

sgugger commented Jan 25, 2023

The next release will be in a month roughly (given the fast last release was yesterday).

@tileintel
Copy link
Contributor

Thank @sgugger for letting us know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants