`blip` support for training #21021

younesbelkada · 2023-01-05T14:02:45Z

What does this PR do?

Fixes: https://discuss.huggingface.co/t/finetune-blip-on-customer-dataset-20893/28446
Before this PR, it was not possible to fine-tune BLIP on a custom dataset due to various reasons, mainly because the code did not supported "on-the-fly" right shifting of decoder_input_ids.
This PR also harmonizes some attributes inside BlipForQuestionAnswering --> I replaced decoder_bos_token_id by decoder_start_token_id to make it consistent with T5 etc.

For all VQA models we should (at train time):
1- make sure labels is not None
2- create decoder_input_ids based on those (make sure the padding is always on the right side)
3- Infer on the text decoder

I feel that we should probably add more tests and create a VisualQuestionAnsweringMixin in a follow up PR to make sure this is done for all VQA models (as I'd expect more VQA models to be added this year)

cc @NielsRogge @sgugger

sgugger

Thanks for fixing this!
I'd rather say no to a new mixin as it would go against Transformers philosophy. We don't have this for seq2seq models for instance and just copy paste the shifting logic.

younesbelkada · 2023-01-05T14:17:07Z

Perfect, thanks for clarifying @sgugger !

HuggingFaceDocBuilderDev · 2023-01-05T14:22:44Z

The documentation is not available anymore as the PR was closed or merged.

NielsRogge · 2023-01-05T15:31:58Z

src/transformers/models/blip/modeling_blip.py

+        elif decoder_input_ids is None:
+            # by default use BOS token as decoder_input_ids
+            decoder_input_ids = torch.LongTensor([self.decoder_start_token_id]).repeat((batch_size, 1))


I'm not sure there's a need for this.

This is handled by the generate method automatically, which will set the decoder_input_ids appropriately.

See also BART and T5 who don't have these lines.

This can be removed indeed
For consistency with the original implementation, I propose to add a safety checker to check that either a label or decoder_input_ids are always passed: https://github.com/salesforce/BLIP/blob/3a29b7410476bf5f2ba0955827390eb6ea1f4f9d/models/blip_vqa.py#L46
When calling the forward pass it seems that labels (i.e. answer on the source code) is always expected.

src/transformers/models/blip/modeling_blip.py

StevenTang1998 · 2023-01-10T17:52:45Z

@younesbelkada @sgugger hi, thanks for contributing this code, but I found two possible bugs:

the code shift labels to decoder_input_id (here) and the code shift labels when computing loss (here) should only keep one, and I prefer to keep the former one and delete the later.
The BERT tokenizer has added a start token before the sequence, and the _shift_right function will add another one (pad), so it should use forced_bos_token_id like BART for generation.

StevenTang1998 · 2023-01-11T03:05:05Z

Moreover, I think the reduction function of CrossEntropyLoss should be set to 'mean', or you will get a loss more than tens or hundreds, which is uncommon and may affect the optimization.

NielsRogge · 2023-01-13T12:59:00Z

Thanks for your valuable comments @StevenTang1998! @younesbelkada in any case it would probably be best to have verified this branch in a notebook on a toy image captioning dataset. Making the code as similar as possible to our other generative models (like T5, BART or GPT-2) would be great.

NielsRogge · 2023-01-17T12:55:13Z

src/transformers/models/blip/modeling_blip.py

+        if labels is None and decoder_input_ids is None:
+            raise ValueError("Either `decoder_input_ids` or `labels` should be passed during inference.")


It's weird that "labels" should be passed during inference?

This is how it's done on the original impelemntation apprently, check: https://github.com/salesforce/BLIP/blob/3a29b7410476bf5f2ba0955827390eb6ea1f4f9d/models/blip_vqa.py#L46 --> answer

Hmm I don't see it, the line you link to links to training mode

Sorry what I meant is: https://github.com/salesforce/BLIP/blob/3a29b7410476bf5f2ba0955827390eb6ea1f4f9d/models/blip_vqa.py#L51-L69

The code you link to is in the "training" mode right? So why would we have the warning that "labels should be passed during inference"? Do you mean training?

Makes sense. I propose a clearer error message in 896bd63

src/transformers/models/blip/modeling_blip.py

- add colab link to documentation - reduction = mean for loss

docs/source/en/model_doc/blip.mdx

NielsRogge

Thanks a lot for fixing!

faiqff94 · 2023-02-20T16:40:59Z

Hi @younesbelkada, I encountered the same error as mentioned by @dxlong2000.
I cloned this repository but the error is still there.

ValueError: Expected input batch_size (0) to match target batch_size (29).

younesbelkada · 2023-02-20T17:48:34Z

Hi @faiqff94
All the issues related to BLIP training should be resolved, if you follow what has been done in https://colab.research.google.com/drive/1lbqiSiA0sDF7JDWPeS0tccrM85LloVha?usp=sharing you should not get any issue. Can you share a reproducible handy script?

pribadihcr · 2024-03-06T01:46:40Z

Hi @faiqff94 All the issues related to BLIP training should be resolved, if you follow what has been done in https://colab.research.google.com/drive/1lbqiSiA0sDF7JDWPeS0tccrM85LloVha?usp=sharing you should not get any issue. Can you share a reproducible handy script?

Hi @younesbelkada, I have tried the colab's script in local PC, but got loss: nan in epoch-0

Any advice? Thanks

blip support for training

bd67bd0

sgugger approved these changes Jan 5, 2023

View reviewed changes

NielsRogge reviewed Jan 5, 2023

View reviewed changes

src/transformers/models/blip/modeling_blip.py Outdated Show resolved Hide resolved

younesbelkada added 2 commits January 17, 2023 10:15

remove labels creation

61309e9

remove unneeded decoder_input_ids creation

772bd06

NielsRogge reviewed Jan 17, 2023

View reviewed changes

src/transformers/models/blip/modeling_blip.py Show resolved Hide resolved

younesbelkada added 2 commits January 17, 2023 16:38

final changes

01a4016

- add colab link to documentation - reduction = mean for loss

fix nits

27c6b98

NielsRogge reviewed Jan 17, 2023

View reviewed changes

docs/source/en/model_doc/blip.mdx Outdated Show resolved Hide resolved

younesbelkada mentioned this pull request Jan 17, 2023

[BLIP] Add image captioning notebook huggingface/notebooks#301

Merged

younesbelkada added 2 commits January 17, 2023 18:59

update link

e6f4b17

clearer error message

896bd63

NielsRogge approved these changes Jan 18, 2023

View reviewed changes

younesbelkada merged commit 023f51f into huggingface:main Jan 18, 2023

younesbelkada mentioned this pull request Jan 20, 2023

[BLIP] fix doctest #21217

Merged

younesbelkada deleted the blip-train-support branch February 20, 2023 17:48

		if labels is None and decoder_input_ids is None:
		raise ValueError("Either `decoder_input_ids` or `labels` should be passed during inference.")

blip support for training #21021

blip support for training #21021

Uh oh!

Conversation

younesbelkada commented Jan 5, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

sgugger left a comment

Choose a reason for hiding this comment

Uh oh!

younesbelkada commented Jan 5, 2023

Uh oh!

HuggingFaceDocBuilderDev commented Jan 5, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

NielsRogge Jan 5, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

younesbelkada Jan 17, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

StevenTang1998 commented Jan 10, 2023

Uh oh!

StevenTang1998 commented Jan 11, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

NielsRogge commented Jan 13, 2023

Uh oh!

NielsRogge Jan 17, 2023

Choose a reason for hiding this comment

Uh oh!

younesbelkada Jan 17, 2023

Choose a reason for hiding this comment

Uh oh!

NielsRogge Jan 17, 2023

Choose a reason for hiding this comment

Uh oh!

younesbelkada Jan 17, 2023

Choose a reason for hiding this comment

Uh oh!

NielsRogge Jan 17, 2023

Choose a reason for hiding this comment

Uh oh!

younesbelkada Jan 17, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

NielsRogge left a comment

Choose a reason for hiding this comment

Uh oh!

faiqff94 commented Feb 20, 2023

Uh oh!

younesbelkada commented Feb 20, 2023

Uh oh!

pribadihcr commented Mar 6, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

`blip` support for training #21021

`blip` support for training #21021

younesbelkada commented Jan 5, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Jan 5, 2023 •

edited

Loading

NielsRogge Jan 5, 2023 •

edited

Loading

StevenTang1998 commented Jan 11, 2023 •

edited

Loading