Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Setup issue #471

Open
catid opened this issue Jul 14, 2024 · 1 comment
Open

Setup issue #471

catid opened this issue Jul 14, 2024 · 1 comment

Comments

@catid
Copy link

catid commented Jul 14, 2024

Trying to follow the simple README instructions on an Ubuntu server with 2x 4090 GPUs and CUDA 12.4:

Installing the current project: cleanrl (2.0.0b1)
(cleanrl) ➜  cleanrl git:(master) poetry run python cleanrl/ppo.py \
    --seed 1 \
    --env-id CartPole-v0 \
    --total-timesteps 50000

/home/catid/mambaforge/envs/cleanrl/lib/python3.10/site-packages/gymnasium/envs/registration.py:523: DeprecationWarning: WARN: The environment CartPole-v0 is out of date. You should consider upgrading to version `v1`.
  logger.deprecation(
/home/catid/mambaforge/envs/cleanrl/lib/python3.10/site-packages/torch/cuda/__init__.py:146: UserWarning:
NVIDIA GeForce RTX 4090 with CUDA capability sm_89 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70.
If you want to use the NVIDIA GeForce RTX 4090 GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/

  warnings.warn(incompatible_device_warn.format(device_name, capability, " ".join(arch_list), device_name))
Traceback (most recent call last):
  File "/home/catid/sources/cleanrl/cleanrl/ppo.py", line 199, in <module>
    action, logprob, _, value = agent.get_action_and_value(next_obs)
  File "/home/catid/sources/cleanrl/cleanrl/ppo.py", line 122, in get_action_and_value
    logits = self.actor(x)
  File "/home/catid/mambaforge/envs/cleanrl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/catid/mambaforge/envs/cleanrl/lib/python3.10/site-packages/torch/nn/modules/container.py", line 139, in forward
    input = module(input)
  File "/home/catid/mambaforge/envs/cleanrl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/catid/mambaforge/envs/cleanrl/lib/python3.10/site-packages/torch/nn/modules/linear.py", line 114, in forward
    return F.linear(input, self.weight, self.bias)
RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

Says something about torch not supporting the 4090 GPU.

@catid
Copy link
Author

catid commented Jul 14, 2024

Seemed to have worked around this by running:

(cleanrl) ➜  cleanrl git:(master) pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu124
zsh: command not found: pip3
(cleanrl) ➜  cleanrl git:(master) python3 -m pip --version
/home/catid/mambaforge/envs/cleanrl/bin/python3: No module named pip
(cleanrl) ➜  cleanrl git:(master) python3 -m ensurepip --upgrade
Looking in links: /tmp/tmp7d8tphfq
Requirement already satisfied: setuptools in /home/catid/mambaforge/envs/cleanrl/lib/python3.10/site-packages (67.7.2)
Processing /tmp/tmp7d8tphfq/pip-23.0.1-py3-none-any.whl
Installing collected packages: pip
Successfully installed pip-23.0.1
(cleanrl) ➜  cleanrl git:(master) pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu124

...

  Attempting uninstall: torch
    Found existing installation: torch 1.12.1
    Uninstalling torch-1.12.1:
      Successfully uninstalled torch-1.12.1
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
cleanrl 2.0.0b1 requires chex==0.1.5, which is not installed.
cleanrl 2.0.0b1 requires optax==0.1.4, which is not installed.
Successfully installed fsspec-2024.6.1 jinja2-3.1.4 mpmath-1.3.0 networkx-3.3 nvidia-cublas-cu12-12.4.2.65 nvidia-cuda-cupti-cu12-12.4.99 nvidia-cuda-nvrtc-cu12-12.4.99 nvidia-cuda-runtime-cu12-12.4.99 nvidia-cudnn-cu12-9.1.0.70 nvidia-cufft-cu12-11.2.0.44 nvidia-curand-cu12-10.3.5.119 nvidia-cusolver-cu12-11.6.0.99 nvidia-cusparse-cu12-12.3.0.142 nvidia-nccl-cu12-2.21.5 nvidia-nvjitlink-cu12-12.4.99 nvidia-nvtx-cu12-12.4.99 pytorch-triton-3.0.0+dedb7bdf33 sympy-1.12.1 torch-2.5.0.dev20240714+cu124 torchaudio-2.4.0.dev20240714+cu124 torchvision-0.20.0.dev20240714+cu124 typing-extensions-4.12.2
(cleanrl) ➜  cleanrl git:(master) poetry run python cleanrl/ppo.py \                                                        --seed 1 \
    --env-id CartPole-v0 \
    --total-timesteps 50000

And then it seems to get farther.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant