[InstructBlip] Add accelerate support for instructblip#24488
Conversation
|
The documentation is not available anymore as the PR was closed or merged. |
|
Thank you :) could you add an integration test? |
|
Hey @NielsRogge ! |
| _no_split_modules = [ | ||
| "InstructBlipAttention", | ||
| "T5Block", | ||
| "OPTDecoderLayer", | ||
| "InstructBlipQFormerMultiHeadAttention", | ||
| ] |
There was a problem hiding this comment.
I think this should be adapted dynamically depending on the models that are actually loaded, to be more future proof.
There was a problem hiding this comment.
Sure, makes sense
There was a problem hiding this comment.
Done, from what I can see and as a side note, InstructBlip relies on T5 or Llama only, so I did a mistake actually putting OPTDecoderLayer. This is also fixed
|
|
||
| if config.use_decoder_only_language_model: | ||
| language_model = AutoModelForCausalLM.from_config(config.text_config) | ||
| self._no_split_modules.append("LlamaDecoderLayer") |
There was a problem hiding this comment.
No, should language_model._no_split_modules here.
There was a problem hiding this comment.
I think language_model._no_split_modules should already contain LlamaDecoderLayer, the purpose of adding it in self._no_split_modules is that in from_pretrained we never look at all the child modules that contain that attribute: https://github.com/huggingface/transformers/blob/195a9e5bdb1faa58cd58b47a23a47734d2b90d8c/src/transformers/modeling_utils.py#L2807C13-L2807C29 | then gets passed to accelerate this way:
transformers/src/transformers/modeling_utils.py
Line 2814 in 195a9e5
language_model. What do you think?
There was a problem hiding this comment.
I don't understand your comment. The code loads any model from the Hub (granted it should be llama models in the actual checkpoints) but you add the specific Llama no split blocks. This will stop working if someone adds an instructblip-2 model that loads a different language model.
There was a problem hiding this comment.
A cleaner implementation maybe would be that in post_init we should add an utility method to look at all child modules that contain _no_split_modules and dynamically append them to self._no_split_modules
There was a problem hiding this comment.
If you think that's better, happy to change the PR to add these changes
There was a problem hiding this comment.
Ah I see yes, sorry I misunderstood your comment, will update that now
| self._no_split_modules.append("LlamaDecoderLayer") | ||
| else: | ||
| language_model = AutoModelForSeq2SeqLM.from_config(config.text_config) | ||
| self._no_split_modules.append("T5Block") |
| r"language_model.decoder.embed_tokens.weight", | ||
| r"language_model.lm_head.weight", | ||
| ] | ||
| _keep_in_fp32_modules = ["wo"] |
There was a problem hiding this comment.
Last comment: this is the same issue here (this comes from T5 if I'm not mistaken). This key should also be build dynamically.
There was a problem hiding this comment.
Makes sense, just updated it!
What does this PR do?
As per title, let's make users benefit from 8bit / 4bit loading of instructblip models
cc @amyeroberts @sgugger @NielsRogge
all
acceleratetests pass for this modelAs a side note, as instruct blip relies on flan-t5 as backbone for some models, therefore it is important to add
To ensure inference stability in fp16 / int8 / fp4