Skip to content

Conversation

@sshleifer
Copy link
Contributor

@sshleifer sshleifer commented Sep 29, 2020

  1. fix DDP access to model.config. We could also set self.config = model.config earlier in __init__
  2. switch torch.Tensor -> torch.tensor. The latter "infers the dtype automatically"
    After which the command in Seq2SeqTrainer Distributed: AttributeError and the RuntimeError #7460 works.

CC @patil-suraj , @TevenLeScao

@sshleifer sshleifer requested a review from sgugger September 29, 2020 21:18
@sshleifer sshleifer changed the title reset model.config Trainer: reset model.config after calling DDP Sep 29, 2020
@sshleifer sshleifer linked an issue Sep 29, 2020 that may be closed by this pull request
@sgugger
Copy link
Collaborator

sgugger commented Sep 29, 2020

Can we see when the config is accessed (in your error message)? model.config should be accessed as sparsely as possible in Trainer to work with any kind of model and I'll probably remove the requirement entirely soon.

@sshleifer sshleifer changed the title Trainer: reset model.config after calling DDP Distributed Trainer: 2 little fixes Sep 29, 2020

# Distributed training (should be after apex fp16 initialization)
if self.args.local_rank != -1:
config = model.config
Copy link
Collaborator

@sgugger sgugger Sep 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't assume model has a config without proper test, having Trainer work with models that are not PreTrainedModels is a feature that has been asked. If there is an access to config that makes the code fail, we should fix that place.

Copy link
Contributor Author

@sshleifer sshleifer Sep 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's already assumed that model.config exists. The base trainer.py accesses model.config 23 times, including in the statement below this one

https://github.com/huggingface/transformers/blob/master/src/transformers/trainer.py#L682

@sshleifer
Copy link
Contributor Author

Seq2SeqTrainer uses model.config 8 times. Mostly pad_token_id to avoid counting padding in the loss func.

@sgugger
Copy link
Collaborator

sgugger commented Sep 30, 2020

It should add an assert the model is a PreTrainedModel at init just to be clean, then for your specific problem, it should use the function self._actual_model() to grab the config to avoid your error (e.g., self.model.config -> self._actual_model().config).

Trainer is on its way to fully handle models without config, see #7464.

@sshleifer
Copy link
Contributor Author

OK. I reduced scope of this PR to just the Tensor -> tensor.

Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That works for me :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Seq2SeqTrainer Distributed: AttributeError and the RuntimeError

2 participants