Skip to content

Conversation

@NielsRogge
Copy link
Contributor

@NielsRogge NielsRogge commented Feb 3, 2023

What does this PR do?

This PR adds BLIP-2 to the library.

To do:

  • make sure generation works exactly as the original implementation, (maybe @gante can have a look here - based on original code here). Edit: seems to be solved by properly setting the eos_token_id!
  • add more tests for BLIP-2 with AutoModelForSeq2SeqLM once designed gets approved
  • transfer checkpoints, update integration tests
  • make it possible to instantiate Blip2Config with config objects, rather than dicts (also check default text config) - will be done in a separate PR

cc @younesbelkada

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Feb 3, 2023

The documentation is not available anymore as the PR was closed or merged.

@NielsRogge NielsRogge requested a review from sgugger February 6, 2023 13:53
@NielsRogge NielsRogge marked this pull request as ready for review February 6, 2023 13:59
Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this. The general design with the auto-model for the LM work for me, since BLIP-2 supports multiple LMs.

@NielsRogge
Copy link
Contributor Author

@sgugger all comments are addressed, feel free to approve :)

Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again for all your work on this!

Copy link
Contributor

@younesbelkada younesbelkada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for this great addition!
accelerate support has been added in NielsRogge#54 - generate + BLIP2 with accelerate is having some issues right now in multi-gpu setting, let's address this in a follow up PR and merge this PR to at least un-lock int8 loading of BLIP2 models (for instance nielsr/blip2-flan-t5-xl can be loaded on a Gcolab (I managed to run it on a NVIDIAT4 16GB))

@NielsRogge NielsRogge merged commit d7f1e7c into huggingface:main Feb 9, 2023
@vmkp
Copy link

vmkp commented Feb 12, 2023

@NielsRogge Curious, what is the timeline for this to make it into a stable release version?

@NielsRogge
Copy link
Contributor Author

Usually there's a Transformers release once every 1 to 2 months, so at the very least in March.

@sachit-menon
Copy link

sachit-menon commented Feb 17, 2023

Hi, thanks for the great work! I'm running into problems trying to use this in the multigpu setting and saw this was mentioned by @younesbelkada earlier -- is there an issue to follow for that? (Specifically, in line 2765 of transformers->generation->utils.py, the devices don't match -- Expected all tensors to be on the same device, but found at least two devices, cuda:3 and cuda:0! because beam_scores is on cuda:0 while next_token_scores and next_token_scores_processed are on cuda:3 after using "auto" for the device_map when loading.)

I'm also getting a weirder error where it causes a CUDA illegal memory access error for any model used downstream of it on GPU 0, even when it's given no GPU memory on GPU 0 in max_memory. (This doesn't occur for the original BLIP2, which I'm trying to migrate from.)

@xszheng2020
Copy link

Same problem here @sachit-menon
"Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0!"
bitsandbytes-foundation/bitsandbytes#153

@younesbelkada
Copy link
Contributor

younesbelkada commented Feb 20, 2023

Hi @sachit-menon @xszheng2020
This is a known issue on my end, I can confirm this should be at least fixed for blip2-opt at #21707
Can you try to checkout from this branch and let us know on the PR if the fix works? thanks!

@xszheng2020
Copy link

Hi, @younesbelkada
thanks! will test it on blip2-opt to see whether it works!
and hope the blip2-flant5 could be fixed soon

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants