-
Notifications
You must be signed in to change notification settings - Fork 508
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
When I move checkpoints from Ubuntu (Azure server) to Windows (my local system) - I get error #162
Comments
Try saving the checkpoint in text format, then load on Windows and convert it back to binary.
|
@antihutka Wow! This is a very simple and workable solution, as I was looking for! Thank you very much! But what is it? It may be worth the developers to release a patch to fix this? |
I'm not sure there's a lot that can be done here. Torch's binary serialization formats are incompatible between platforms and writing a conversion script might be the best option. |
Marking this is as solved. Platform dependent binary serialization is documented in torch. If other users encounter this problem, it may be worth documenting in this project too. |
Hello. I'm not a very typical situation and it's probably likely linked to the Torch, but I'm asking you to help me.
I wrote a semi-automatic script to start the training of the neural network in NC instances (GPU, Tesla K80) in Azure. There I am using cuda docker.
And I was able to run torch-rnn on Win7 and Win10 with distro-win (it was very painful, especially on Win7 - the easiest way to do it on Win10!). If you read this issue and want to run torch-rnn to Win - you will wonder how I was able to run torch-hdf5.
My question is this: when I trained the network on Azure (Ubuntu 16.04 x64, with GPU) and moved the checkpoint files in Windows7 (x64, without GPU, CPU only) - I got an error when running the sample.lua:
It probably has to do with the version Torch and deserialize it .t7 files because I found similar problems: #148 and #80 . And if you try to run the test from the last issue, it will fail with the same error:
I tried to train a network to another Windows (Win10 64b, with the GPU) and use checkpoint files from there on my Win7 (64b, without GPU, only CPU) and it worked! I read in other issues that the file transfer from the GPU to CPU works fine.
I understand that this error is related to deserialization, but I can't solve it. Please, please help me, I want to train a network on Azure and use on Windows.
The text was updated successfully, but these errors were encountered: