-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow to set a device when loading a model #154
Allow to set a device when loading a model #154
Conversation
Edited the save and load test to also test the load method with all possible devices. Added the changes to the changelog
# check if params are still the same after load | ||
new_params = model.policy.state_dict() | ||
# Check if the model loads as expected for every possible choice of device: | ||
for device in ["auto", "cpu", "cuda"]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed that the git code comparison looks quite messy. I'm elaborating about the changes I've made here to ease the review process for you:
The actual change that I made here is the added 'for' loop that goes over all possible devices, and at each iteration the device parameter is passed to the call of 'load' (line 76). At the end of each iteration I delete the model (line 92) so it can be loaded cleanly at the next iteration.
Everything else is the same as before, i.e., I've used the exact same test (inside the new 'for' loop) to ensure proper loading and tested with all possible values of the new argument 'device'.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems that you are actually not testing that the device parameter was successfully used.
Also, you should skip the cuda device if no GPU is available
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- You're right. I will work on improving the test.
- What should be the expected behavior when a user uses "device='cuda'" on a machine with no GPU?
I noticed that the c'tor defaults to using the CPU in that case without notifying the user.
Anyway, I think the test should include all possible inputs while verifying that the outcome matches your expectations. Do you agree?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've used in my test the utils.get_device() function (which is used inside the constructor as well) to determine the device. This way, if for example, the behavior of get_device will change, the test won't break.
… policy would change, it wouldn't break the test.
@araffin's suggestion during the PR process Co-authored-by: Antonin RAFFIN <[email protected]>
running on GPU, it yields this error:
|
Co-authored-by: Antonin RAFFIN <[email protected]>
Thanks for all your help! I'll look into it. |
…et_device() doesn't provide device index. Now the code loads all of the model parameters from the saved state dict straight into the required device. (fixed load_from_zip_file).
When comparing the devices in the test, I restored the comparison of types only, since the "get_device()" function doesn't fill the device index, which causes problems. |
@@ -352,6 +352,7 @@ def load_from_pkl(path: Union[str, pathlib.Path, io.BufferedIOBase], verbose=0) | |||
def load_from_zip_file( | |||
load_path: Union[str, pathlib.Path, io.BufferedIOBase], | |||
load_data: bool = True, | |||
device: Union[th.device, str] = "auto", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does the order of the arguments here makes sense? I'm not sure if I should have added the new argument last, for cases where users didn't use explicit keyword arguments.
On the other hand, I think it makes more sense to be in front of 'verbose'...
Now I'm observing another concerning issue related to this: on my GPU capable machine "test_predict.test_predict" fails on the same assertion (
|
I see... but you could easily fix that by passing |
It doesn't fix the issue, unfortunately. |
|
I think as it is fast and easy to fix, please update the tests to use |
…dated the assertion to consider only types of devices. Also corrected a related bug in 'get_device()' method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks =)
Added a 'device' keyword argument to BaseAlgorithm.load(), to enable users to set the model to their device of choice.
Edited test_save_load to also test the load method with all possible devices.
Added the changes to the changelog (I'm not completely confident about the choice of words though).
Description
Added a 'device' keyword argument with default value 'auto', similarly to the hard-coded value that was used before my change to BaseAlgorithm.load(), and forwarded it to the C'tor call inside the load method (instead of the hard-coded string parameter).
Motivation and Context
I'm using this repo for my research and when I work with simpler models (tiny MLP for example) it is actually faster to use the CPU even though my machine has a powerful GPU since the overhead of the calls is higher than the benefits.
Thus, when I load the models I'm training, I need to be able to force them to load to the CPU easily.
Since I've seen that the code is already there but currently the device is chosen using a hard-coded string inside the load method, I suggested to make this little but significant change (:
Closes #153
Types of changes
Checklist:
make format
(required)make check-codestyle
andmake lint
(required)make pytest
andmake type
both pass. (required)