Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError in colab #261

Closed
amrzv opened this issue Jan 15, 2023 · 7 comments
Closed

ValueError in colab #261

amrzv opened this issue Jan 15, 2023 · 7 comments

Comments

@amrzv
Copy link

amrzv commented Jan 15, 2023

Hi.
When running samplefactory_hub_example.ipynb notebook in colab the ValueError raises:
image

Seems that there is a bug in sample_factory.utils.gpu_utils.py.

@alex-petrenko
Copy link
Owner

looks like CUDA_VISIBLE_DEVICES contains something like 0,\n and we don't expect this.

@alex-petrenko
Copy link
Owner

Pushed a small fix, see if this helps :)

7509cc4

@amrzv
Copy link
Author

amrzv commented Jan 16, 2023

I forgot to mention, that this error was raised when running in colab without GPU. Now, when running in colab without GPU the ValueError above is gone, but next another raises:
image
Seems that the device should be manually set before in cfg:
cfg.device = 'gpu' if torch.cuda.is_available() else 'cpu'

However, when I add this line to cfg, restart notebook and run again I still observe device=='gpu' in logging information:
image

I'm not sure if everything is supposed to work on CPU.

When running in colab with GPU whole notebook works without errors.

@andrewzhang505
Copy link
Collaborator

andrewzhang505 commented Jan 16, 2023

From the screenshot, it looks like the notebook is loading an old experiment where the cfg still has the device set to gpu. Could you try changing the experiment_name and running again with cfg.device=cpu? Alternatively, you can add the command line --restart_behavior=overwrite to the argv to wipe out the old experiment

@amrzv
Copy link
Author

amrzv commented Jan 16, 2023

I cleared the local dir, restarted the kernel and run again. Now notebook works without errors:
image

So, probably this line should be added as on the screenshot:
cfg.device = 'gpu' if torch.cuda.is_available() else 'cpu'

@alex-petrenko
Copy link
Owner

@andrewzhang505 thank you for fixing this!

@amrzv thank you for reporting the issue!
Are you generally able to run serious multi-process configurations of Sample Factory in Colab Notebooks? I have no experience with this myself, and I know that Jupiter notebooks had some issues with multiprocessing preventing people from unlocking the full speed and power of the codebase.

I'd be happy to know if Colab solves this issue.

Also, certain types of environments work really well with a single-process synchronous configuration (such as Brax and IsaacGym). I can imagine interactive notebooks are great for these environments.

@amrzv
Copy link
Author

amrzv commented Jan 17, 2023

I would say that Colab environment has the same limitations as Jupyter.
So, what about multiprocessing, the behaviour and issues which you mentioned are the same in Colab.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants