Skip to content

Conversation

@ArthurZucker
Copy link
Collaborator

What does this PR do?

Adds support for OPT in both Flax and TF

Who can review?

@patrickvonplaten, @LysandreJik @younesbelkada @patil-suraj @sgugger

@ArthurZucker ArthurZucker self-assigned this May 24, 2022
@ArthurZucker ArthurZucker added the Core: Modeling Internals of the library; Models. label May 24, 2022
@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented May 24, 2022

The documentation is not available anymore as the PR was closed or merged.

@patrickvonplaten
Copy link
Contributor

Should we close the other PR? Let me know once it's ready for a review :-)

@patrickvonplaten
Copy link
Contributor

Superseeds #17227 and #17226

@patrickvonplaten
Copy link
Contributor

Cool, very nice job @ArthurZucker !

Could you as a final safety guard also add TFOPT and FlaxOPT to the documentation test suite?

See: https://github.com/huggingface/transformers/tree/main/docs#docstring-testing

Copy link
Contributor

@patrickvonplaten patrickvonplaten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks very nice from my side!


EXPECTED_OUTPUTS = [
"Today is a beautiful day and I want to thank",
"Today is a beautiful day and I want everyone",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great thanks!

@ArthurZucker
Copy link
Collaborator Author

Can I merge @LysandreJik @sgugger ? (failing test are not related to OPT)

src/transformers/models/mobilebert/modeling_mobilebert.py
src/transformers/models/mobilebert/modeling_tf_mobilebert.py
src/transformers/models/opt/modeling_opt.py
src/transformers/models/opt/modeling_tf_opt.py
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@patrickvonplaten
Copy link
Contributor

@patil-suraj could you quickly check Flax and maybe @gante go over TF OPT?

Copy link
Contributor

@gante gante left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was a lot of work put into this, and it is close to completion 💪 I've added a few questions, suggestions, and corrections on the TF side, but they should be straightforward. Great work!

P.S.: double-checking: have you run the slow tests locally? If you did, it implies that TF XLA generation is working for OPT 🎉

for idx, decoder_layer in enumerate(self.layers):
if output_hidden_states:
all_hidden_states += (hidden_states,)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The LayerDrop, present in the PT version (L640), is missing

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it will be removed in PT as well (should probably need another PR WDYT @patrickvonplaten

)
self.assertTrue(np.allclose(output[:, :3, :3], expected_slice, atol=4e-3))

xla_generate = tf.function(model, jit_compile=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😱 does this run? If it does, ignore my comment above about the _update_model_kwargs_for_xla_generation function

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not try it yet (xla not compatible with M1 chip, will try on brutasse soon

Copy link
Contributor

@gante gante May 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it doesn't, remove the XLA compilation in the tests. It's still very brittle :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested it and it seems to be working fine ! 🥳

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔥

Copy link
Contributor

@gante gante left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to go, from the TF end 👍

Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more nits, but LGTM otherwise. Thanks a lot!

cached_value.value = value
num_updated_cache_vectors = query.shape[1]
cache_index.value = cache_index.value + num_updated_cache_vectors
# causal mask for cached decoder self-attention: our single query position should only attend to those key positions that have already been generated and cached, not the remaining zero elements.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we respect the 119 char limit here and split that comment on several lines? ;-)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The long comment comes from BART, it has to be kept like that otherwise the repo-consistency fails.
Should leave like that

@ArthurZucker
Copy link
Collaborator Author

Thanks all for the reviews 😄 🥳

@ArthurZucker ArthurZucker merged commit 7822a9b into huggingface:main May 31, 2022
Narsil pushed a commit to Narsil/transformers that referenced this pull request Jun 7, 2022
* initial commit

* add init file

* update globakl init

* update index and dummy objects

* style

* update modelling auto

* fix initi typo in src/transformers

* fix typo in modeling tf auto, opt was in wrong mapping name

* fixed a slow test : saved_model

* style

* fix positionnal embedding if no position id is provided

* update tf test

* update test flax requirements

* fixed serialization

* update

* update tf name to allow smooth convertion

* update flax tests

* style

* fix test typo

* fix tf typo test

* add xla for generate support in causal LM

* fixed bug

* cleaned tf tests

* style

* removed from PT for slow tests

* fix typp

* opt test as slow

* trying to fix GPT2 undefined

* correct documentation and add to test doc

* update tf doc

* fix doc

* fake commit

* Apply suggestions from code review

Co-authored-by: Joao Gante <[email protected]>

* update test based on review

* merged main layer for functionning test

* fixup + quality

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <[email protected]>

* update long comment

* make fix copies

Co-authored-by: Arthur <[email protected]>
Co-authored-by: Joao Gante <[email protected]>
Co-authored-by: Sylvain Gugger <[email protected]>
elusenji pushed a commit to elusenji/transformers that referenced this pull request Jun 12, 2022
* initial commit

* add init file

* update globakl init

* update index and dummy objects

* style

* update modelling auto

* fix initi typo in src/transformers

* fix typo in modeling tf auto, opt was in wrong mapping name

* fixed a slow test : saved_model

* style

* fix positionnal embedding if no position id is provided

* update tf test

* update test flax requirements

* fixed serialization

* update

* update tf name to allow smooth convertion

* update flax tests

* style

* fix test typo

* fix tf typo test

* add xla for generate support in causal LM

* fixed bug

* cleaned tf tests

* style

* removed from PT for slow tests

* fix typp

* opt test as slow

* trying to fix GPT2 undefined

* correct documentation and add to test doc

* update tf doc

* fix doc

* fake commit

* Apply suggestions from code review

Co-authored-by: Joao Gante <[email protected]>

* update test based on review

* merged main layer for functionning test

* fixup + quality

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <[email protected]>

* update long comment

* make fix copies

Co-authored-by: Arthur <[email protected]>
Co-authored-by: Joao Gante <[email protected]>
Co-authored-by: Sylvain Gugger <[email protected]>
amyeroberts pushed a commit to amyeroberts/transformers that referenced this pull request Jun 16, 2022
* initial commit

* add init file

* update globakl init

* update index and dummy objects

* style

* update modelling auto

* fix initi typo in src/transformers

* fix typo in modeling tf auto, opt was in wrong mapping name

* fixed a slow test : saved_model

* style

* fix positionnal embedding if no position id is provided

* update tf test

* update test flax requirements

* fixed serialization

* update

* update tf name to allow smooth convertion

* update flax tests

* style

* fix test typo

* fix tf typo test

* add xla for generate support in causal LM

* fixed bug

* cleaned tf tests

* style

* removed from PT for slow tests

* fix typp

* opt test as slow

* trying to fix GPT2 undefined

* correct documentation and add to test doc

* update tf doc

* fix doc

* fake commit

* Apply suggestions from code review

Co-authored-by: Joao Gante <[email protected]>

* update test based on review

* merged main layer for functionning test

* fixup + quality

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <[email protected]>

* update long comment

* make fix copies

Co-authored-by: Arthur <[email protected]>
Co-authored-by: Joao Gante <[email protected]>
Co-authored-by: Sylvain Gugger <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Core: Modeling Internals of the library; Models.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants