Skip to content

Conversation

@ydshieh
Copy link
Collaborator

@ydshieh ydshieh commented Dec 9, 2022

What does this PR do?

  • Currently, the tiny models in pipeline tests are created using model_class(config), and for TF models, the weights are not created at this point. Then we set the device (which turns to be CPU instead of GPU!), and the wegiths are created in CPU context --> we get the expected exceptions.
  • In the future, we will use the tiny model from Hub repos
  • The models are loaded using from_pretrained
  • If GPU available, those weights are initialized in GPU context automatically for TF models, including embedding layers
  • From what @Rocketknight1 says at the end, we won't get the expected exceptions in this situation (TF, layer weights loaded in GPU context)

In order to use the tiny models from the Hub without any pipeline test failure, we will have to skip this check under the above described situation.

From @Rocketknight1

Embedding layers in Keras have different behaviours on CPU and GPU when you pass invalid indices. On CPU, the layer checks inputs and throws an error if you're out of range, but on GPU you just get a zeros tensor as output with no error

@ydshieh ydshieh requested review from Narsil and sgugger December 9, 2022 16:10
with self.assertRaises(Exception):
outputs = summarizer("This " * 1000)
# For TF models, if the weights are initialized in GPU context, we won't get expected index error from
# the embedding layer.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is necessary if we want to use tiny models from the Hub for pipeline tests.

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Dec 9, 2022

The documentation is not available anymore as the PR was closed or merged.

if not (
isinstance(model, TFPreTrainedModel)
and get_gpu_count() > 0
and len(summarizer.model.trainable_weights) > 0
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the moment on main, this condition len(summarizer.model.trainable_weights) is False, as the model is not obtained from from_pretrained but by model_class(config) which won't create any TF weight at this point.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So should the model be properly created with weights (guessing it needs a forward pass on dummy inputs)?

Copy link
Collaborator Author

@ydshieh ydshieh Dec 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sgugger On current main, there is no test failure around what I mentioned in this PR. We don't get the TF weights at this line, and they will be created inside pipeline forward (which is being in CPU context, despite we have GPU), and we get the expected exception --> not test failure.

This PR is just to make the necessary changes so my WIP PR could be merged without failure, where len(summarizer.model.trainable_weights) > 0 will become True --> and we need to skip in this case.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR title is somehow misleading though, sorry.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't get why it's useful to support a model with no weights here. The fix is to make sure they have weights.

Copy link
Collaborator Author

@ydshieh ydshieh Dec 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me explain:

Current main

  • pipeline tests use tiny models created dynamically
    • they are created using model_class(config)
    • they will have different weights each time the tests are launched
    • (a major part of the) tests don't test the outputs against expected values - I think the tests just make sure the pipelines could run + some other checks (but not the outputs)
    • if I make them having weights before pipeline forward, then the current CI will fail without this PR on GPU

This PR:

  • It doesn't support a model with no weights
    • When a TF model having weights + on GPU --> it skips the check of exception should be given

@ydshieh ydshieh changed the title Fix a logic in pipeline test regarding TF Change a logic in pipeline test regarding TF Dec 9, 2022
Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, thanks for the explanations!

@Rocketknight1
Copy link
Member

cc @gante to this one - he's been working on a way to check for invalid indices even for embedding layers on GPU

Copy link
Contributor

@Narsil Narsil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ydshieh ydshieh merged commit a12c5cb into main Dec 13, 2022
@ydshieh ydshieh deleted the fix_pipe_test_2 branch December 13, 2022 12:42
mpierrau pushed a commit to mpierrau/transformers that referenced this pull request Dec 15, 2022
* Fix the pipeline test regarding TF

* Fix the pipeline test regarding TF

* update comment

Co-authored-by: ydshieh <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants