Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loading a model checkpoint that is trained on TPU using a GPU #2303

Closed
ArthDh opened this issue Jun 20, 2020 · 10 comments · Fixed by #4309
Closed

Loading a model checkpoint that is trained on TPU using a GPU #2303

ArthDh opened this issue Jun 20, 2020 · 10 comments · Fixed by #4309
Labels
accelerator: tpu Tensor Processing Unit question Further information is requested

Comments

@ArthDh
Copy link

ArthDh commented Jun 20, 2020

What is your question?

Is it possible to load a model that is trained on a TPU saved using ModelCheckpoint on a GPU for inference?

Code

        model = LightModel(hparams)
        trainer = pl.Trainer(resume_from_checkpoint=str(ckpt), gpus=1)
        trainer.test(model)

What have you tried?## ❓ Questions and Help

Tried to normally load the weights as with a GPU but throws an error.

What's your environment?

Kaggle GPU
torchvision==0.6.0a0+82fd1c8
torch==1.5.0
pytorch-lightning-0.8.1

@ArthDh ArthDh added the question Further information is requested label Jun 20, 2020
@Laksh1997
Copy link

Have you tried to load on CPU?

@Geeks-Sid
Copy link

Can you put the error?

@ArthDh
Copy link
Author

ArthDh commented Jun 21, 2020

RuntimeError: Could not run 'aten::empty_strided' with arguments from the 'XLATensorId' backend. 'aten::empty_strided' is only available for these backends: [CPUTensorId, CUDATensorId, BackendSelect, VariableTensorId].

@ArthDh
Copy link
Author

ArthDh commented Jun 21, 2020

@Laksh1997 I tried, it still gives the RuntimeError

@Geeks-Sid
Copy link

This looks like a PyTorch Issue and this looks something similar. Your code looks fine, someone senior should take a look I guess. @Borda

@stale
Copy link

stale bot commented Aug 20, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the won't fix This will not be worked on label Aug 20, 2020
@Borda
Copy link
Member

Borda commented Aug 21, 2020

@lezwon mind have look? :]

@stale stale bot removed the won't fix This will not be worked on label Aug 21, 2020
@lezwon
Copy link
Contributor

lezwon commented Aug 21, 2020

@ArthDh The fix for this issue is in progress here: #3044. The issue is that Lightning as of now saves the model as XLA tensors instead of CPU ones. Hence when you try to load them on GPU they are unable to find an XLA device and hence fail.

@ArthDh
Copy link
Author

ArthDh commented Aug 21, 2020

@lezwon Thank you for the update!

@stale
Copy link

stale bot commented Oct 22, 2020

This issue has been automatically marked as stale because it hasn't had any recent activity. This issue will be closed in 7 days if no further activity occurs. Thank you for your contributions, Pytorch Lightning Team!

@stale stale bot added the won't fix This will not be worked on label Oct 22, 2020
@lezwon lezwon added Important accelerator: tpu Tensor Processing Unit labels Oct 22, 2020
@stale stale bot removed won't fix This will not be worked on labels Oct 22, 2020
@lezwon lezwon mentioned this issue Oct 23, 2020
8 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accelerator: tpu Tensor Processing Unit question Further information is requested
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants