Loading a model checkpoint that is trained on TPU using a GPU #2303

ArthDh · 2020-06-20T20:09:56Z

What is your question?

Is it possible to load a model that is trained on a TPU saved using ModelCheckpoint on a GPU for inference?

Code

        model = LightModel(hparams)
        trainer = pl.Trainer(resume_from_checkpoint=str(ckpt), gpus=1)
        trainer.test(model)

What have you tried?## ❓ Questions and Help

Tried to normally load the weights as with a GPU but throws an error.

What's your environment?

Kaggle GPU
torchvision==0.6.0a0+82fd1c8
torch==1.5.0
pytorch-lightning-0.8.1

The text was updated successfully, but these errors were encountered:

Laksh1997 · 2020-06-21T18:06:59Z

Have you tried to load on CPU?

Geeks-Sid · 2020-06-21T20:02:14Z

Can you put the error?

ArthDh · 2020-06-21T20:16:51Z

RuntimeError: Could not run 'aten::empty_strided' with arguments from the 'XLATensorId' backend. 'aten::empty_strided' is only available for these backends: [CPUTensorId, CUDATensorId, BackendSelect, VariableTensorId].

ArthDh · 2020-06-21T20:20:18Z

@Laksh1997 I tried, it still gives the RuntimeError

Geeks-Sid · 2020-06-21T20:23:00Z

This looks like a PyTorch Issue and this looks something similar. Your code looks fine, someone senior should take a look I guess. @Borda

stale · 2020-08-20T21:16:45Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Borda · 2020-08-21T17:22:03Z

@lezwon mind have look? :]

lezwon · 2020-08-21T17:51:54Z

@ArthDh The fix for this issue is in progress here: #3044. The issue is that Lightning as of now saves the model as XLA tensors instead of CPU ones. Hence when you try to load them on GPU they are unable to find an XLA device and hence fail.

ArthDh · 2020-08-21T18:28:48Z

@lezwon Thank you for the update!

stale · 2020-10-22T02:24:17Z

This issue has been automatically marked as stale because it hasn't had any recent activity. This issue will be closed in 7 days if no further activity occurs. Thank you for your contributions, Pytorch Lightning Team!

ArthDh added the question Further information is requested label Jun 20, 2020

stale bot added the won't fix This will not be worked on label Aug 20, 2020

stale bot removed the won't fix This will not be worked on label Aug 21, 2020

stale bot added the won't fix This will not be worked on label Oct 22, 2020

lezwon added Important accelerator: tpu Tensor Processing Unit labels Oct 22, 2020

stale bot removed won't fix This will not be worked on labels Oct 22, 2020

lezwon mentioned this issue Oct 23, 2020

Tpu save #4309

Merged

8 tasks

SeanNaren closed this as completed in #4309 Dec 2, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Loading a model checkpoint that is trained on TPU using a GPU #2303

Loading a model checkpoint that is trained on TPU using a GPU #2303

ArthDh commented Jun 20, 2020

Laksh1997 commented Jun 21, 2020

Geeks-Sid commented Jun 21, 2020

ArthDh commented Jun 21, 2020

ArthDh commented Jun 21, 2020

Geeks-Sid commented Jun 21, 2020

stale bot commented Aug 20, 2020

Borda commented Aug 21, 2020

lezwon commented Aug 21, 2020

ArthDh commented Aug 21, 2020

stale bot commented Oct 22, 2020

Loading a model checkpoint that is trained on TPU using a GPU #2303

Loading a model checkpoint that is trained on TPU using a GPU #2303

Comments

ArthDh commented Jun 20, 2020

What is your question?

Code

What have you tried?## ❓ Questions and Help

What's your environment?

Laksh1997 commented Jun 21, 2020

Geeks-Sid commented Jun 21, 2020

ArthDh commented Jun 21, 2020

ArthDh commented Jun 21, 2020

Geeks-Sid commented Jun 21, 2020

stale bot commented Aug 20, 2020

Borda commented Aug 21, 2020

lezwon commented Aug 21, 2020

ArthDh commented Aug 21, 2020

stale bot commented Oct 22, 2020