Add TF port of BLIP#22090
Conversation
|
The documentation is not available anymore as the PR was closed or merged. |
|
The TF port is mostly complete now and tests are passing locally - I just need to go around updating docs and auto classes and so on. The main code should be ready for review! |
fb88fd4 to
120d189
Compare
amyeroberts
left a comment
There was a problem hiding this comment.
Nice 🔥 Thanks for adding this model!
Mostly nits. The main comment is question whether TODO comments have been resolved.
gante
left a comment
There was a problem hiding this comment.
Oof, this is a long one -- good job on getting it to the finish line! It should be close to a mergeable state.
A few general comments:
- Missing: new modeling files in doctests;
- The PR has some minor issues that came from the PT implementation. It would be nice to correct them as well!
- The
trainingargument is missing 😱 It needs to be added throughout the code.
| ("bert", "TFBertModel"), | ||
| ("blenderbot", "TFBlenderbotModel"), | ||
| ("blenderbot-small", "TFBlenderbotSmallModel"), | ||
| ("blip", "TFBlipModel"), |
There was a problem hiding this comment.
I think we are missing a few auto classes -- also missing on the PT side!
|
Looks like there are many comments to address for now. Please ping me again when it's ready for second review! |
|
Got through a lot of the comments today, but I have a couple of other things to do - will try to finish them tomorrow! |
|
The last remaining big issue is that some of the pt-tf equivalence tests fail when weights don't match up between models. This is caused by the cross-attention weights not being built, presumably because those layers aren't being called in the forward pass. I'm working on figuring out why and resolving that! |
|
The issue seems to be that in all of our other models, cross-attention layers are only added when |
gante
left a comment
There was a problem hiding this comment.
Looks good!
Two high-level items from the previous review remaining:
- Missing: new modeling files in doctests;
- The training argument is missing 😱 It needs to be added throughout the code.
|
It's coming, don't worry! This cross-attention behaviour is just very odd and I'm trying to track it down first |
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
|
Hi all! I've addressed all comments and local tests look good. The remaining issues are:
I'm not sure about the auto classes, though - they're missing in the original PT version of the model as well, so this didn't seem like the right PR to add them. |
|
cc @sgugger - I think this is ready for a final review at last! |
sgugger
left a comment
There was a problem hiding this comment.
Thanks for your PR! It sadly cannot be merged until the pt/tf equivalence tests are all passing. There is no model tester that skips them in the code base, so let's not BLIP be the first one.
If the fact BLIP is an encoder/decoder make changes necessary to the base tests in the model tester classes. The test can be overwritten.
| """ | ||
| return cls(config, **kwargs) | ||
|
|
||
| def invert_attention_mask(self, encoder_attention_mask: tf.Tensor) -> tf.Tensor: |
There was a problem hiding this comment.
This does not use the state, so better put this as a function in tf_utils. (same for the other two below)
We should probably cleanup the PyTorch side to do the same.
There was a problem hiding this comment.
Done! I didn't touch the PyTorch side yet because that's a bigger refactor that touches several models, but I can do it in another PR after this if you want.
| @unittest.skip(reason="This test class covers encoder-decoder models that the base test does not work with.") | ||
| def test_pt_tf_model_equivalence(self): | ||
| pass |
There was a problem hiding this comment.
Needs to be rewritten then. We cannot skip this test.
| self.assertIsNotNone(model) | ||
|
|
||
| @unittest.skip(reason="This test class covers encoder-decoder models that the base test does not work with.") | ||
| def test_pt_tf_model_equivalence(self): |
| self.assertIsNotNone(model) | ||
|
|
||
| @unittest.skip(reason="This test class covers encoder-decoder models that the base test does not work with.") | ||
| def test_pt_tf_model_equivalence(self): |
|
Got it, I'll figure out some way to re-enable those tests, or override them with versions that do work! |
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
|
@sgugger this should be ready for review with all comments addressed! The failing test is in an unrelated model |
sgugger
left a comment
There was a problem hiding this comment.
Still some equivalence tests missing.
| self._override_model_class = override_model_class | ||
|
|
||
| def get_inputs(self, pt_model, config): | ||
| def get_inputs(self, pt_model, tf_dummy_inputs, config): |
There was a problem hiding this comment.
The changes here seem unrelated to this PR and would be better in their own PR, no?
There was a problem hiding this comment.
Fair! I added them because they were needed for the pt-to-tf code to port the BLIP models correctly. If you'd rather I move them to a separate PR though, that's fine!
| @unittest.skip(reason="This test class covers encoder-decoder models that the base test does not work with.") | ||
| def test_pt_tf_model_equivalence(self): | ||
| pass |
| self.assertIsNotNone(model) | ||
|
|
||
| @unittest.skip(reason="This test class covers encoder-decoder models that the base test does not work with.") | ||
| def test_pt_tf_model_equivalence(self): |
| self.assertIsNotNone(model) | ||
|
|
||
| @unittest.skip(reason="This test class covers encoder-decoder models that the base test does not work with.") | ||
| def test_pt_tf_model_equivalence(self): |
|
@sgugger Sorry for the confusion - that equivalence test is present in both the |
|
Yes we do. |
sgugger
left a comment
There was a problem hiding this comment.
Failing tests are unrelated.
gante
left a comment
There was a problem hiding this comment.
(pt-to-tf changes LGTM 👍 )
|
Going to leave the |
* Initial commit * more stash commit * Yet another stash commit * yet more stash commit * Mostly working except for docs / repo consistency * Stop importing model list from torch file * Add TF BLIP models to docs * Add auto classes * Move get_text_features and get_image_features * Update src/transformers/models/blip/modeling_tf_blip.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/blip/modeling_tf_blip.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/blip/modeling_tf_blip.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/blip/modeling_tf_blip_text.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/blip/modeling_tf_blip.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/blip/modeling_tf_blip.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/blip/modeling_tf_blip.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/blip/modeling_tf_blip.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/blip/modeling_tf_blip.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/models/blip/test_modeling_tf_blip.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/models/blip/test_modeling_tf_blip.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/blip/modeling_tf_blip.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/blip/modeling_tf_blip.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * Update tests/models/blip/test_modeling_tf_blip_text.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/blip/modeling_tf_blip_text.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * Update src/transformers/models/blip/modeling_tf_blip.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Use channels_last convolutions in TF (better performance + compatibility) * Remove _shape function * Move multi-line statement to one line in PT + TF * Specify tf.keras.layers instead of importing from it * Remove test_gradient_checkpointing and empty test_training methods * move some multi-line statements to one line * Update docstring for generate * Remove pruned heads set * Remove self.seq_len_dim * Fixed issues with loss computation, should resolve some tests. Also ensured that the PT version follows the config for output_attentions and output_hidden_states * ensure original model follows config in more cases * Skip the same cross-attention tests in the PT tests - didn't realize we did it twice! * Add training args throughout the models and layers * make fixup * Fix docstring for inputs_embeds * Add docstring for is_decoder * Add docstrings to text models * Remove redundant computation * Add unpack_inputs / keras_serializable * Add modeling_tf_blip to doctests * Add config classes for keras serialization * Changes to allow model porting with pt-to-tf * Quick fix to decoder head and test tweaks * Revert an issue with masking the embeddings outputs * Allow missing keys in some equivalence tests (for unused layers) * Add tf-pt equivalence tests back in * Update src/transformers/models/blip/modeling_tf_blip.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/blip/modeling_tf_blip_text.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/blip/modeling_tf_blip_text.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * make fixup * Refactor invert_attention_mask out into tf_utils * Re-enable cross-tests on the PT side too --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Initial commit * more stash commit * Yet another stash commit * yet more stash commit * Mostly working except for docs / repo consistency * Stop importing model list from torch file * Add TF BLIP models to docs * Add auto classes * Move get_text_features and get_image_features * Update src/transformers/models/blip/modeling_tf_blip.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/blip/modeling_tf_blip.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/blip/modeling_tf_blip.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/blip/modeling_tf_blip_text.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/blip/modeling_tf_blip.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/blip/modeling_tf_blip.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/blip/modeling_tf_blip.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/blip/modeling_tf_blip.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/blip/modeling_tf_blip.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/models/blip/test_modeling_tf_blip.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/models/blip/test_modeling_tf_blip.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/blip/modeling_tf_blip.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/blip/modeling_tf_blip.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * Update tests/models/blip/test_modeling_tf_blip_text.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/blip/modeling_tf_blip_text.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * Update src/transformers/models/blip/modeling_tf_blip.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Use channels_last convolutions in TF (better performance + compatibility) * Remove _shape function * Move multi-line statement to one line in PT + TF * Specify tf.keras.layers instead of importing from it * Remove test_gradient_checkpointing and empty test_training methods * move some multi-line statements to one line * Update docstring for generate * Remove pruned heads set * Remove self.seq_len_dim * Fixed issues with loss computation, should resolve some tests. Also ensured that the PT version follows the config for output_attentions and output_hidden_states * ensure original model follows config in more cases * Skip the same cross-attention tests in the PT tests - didn't realize we did it twice! * Add training args throughout the models and layers * make fixup * Fix docstring for inputs_embeds * Add docstring for is_decoder * Add docstrings to text models * Remove redundant computation * Add unpack_inputs / keras_serializable * Add modeling_tf_blip to doctests * Add config classes for keras serialization * Changes to allow model porting with pt-to-tf * Quick fix to decoder head and test tweaks * Revert an issue with masking the embeddings outputs * Allow missing keys in some equivalence tests (for unused layers) * Add tf-pt equivalence tests back in * Update src/transformers/models/blip/modeling_tf_blip.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/blip/modeling_tf_blip_text.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/blip/modeling_tf_blip_text.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * make fixup * Refactor invert_attention_mask out into tf_utils * Re-enable cross-tests on the PT side too --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Work in progress right now, will update this when it's closer to being ready!