Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

device='cuda:1' did not work #38

Open
tkotani opened this issue Aug 7, 2024 · 1 comment
Open

device='cuda:1' did not work #38

tkotani opened this issue Aug 7, 2024 · 1 comment

Comments

@tkotani
Copy link

tkotani commented Aug 7, 2024

In your Simple example at https://github.com/princeton-vl/lietorch/blob/master/README.md#simple-example
,
I replaced

phi = torch.randn(8000, 3, device='cuda', requires_grad=True)

with

phi = torch.randn(8000, 3, device='cuda:1', requires_grad=True)

. Then I named the example as lietest1.py .Then I run

>python lietest1.py

ends up with

Traceback (most recent call last):
  File "/home/takao/org/diffusion-point-cloud/lietest1.py", line 15, in <module>
    loss.backward()
  File "/home/takao/.local/lib/python3.10/site-packages/torch/_tensor.py", line 525, in backward
    torch.autograd.backward(
  File "/home/takao/.local/lib/python3.10/site-packages/torch/autograd/__init__.py", line 267, in backward
    _engine_run_backward(
  File "/home/takao/.local/lib/python3.10/site-packages/torch/autograd/graph.py", line 744, in _engine_run_backward
    return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
  File "/home/takao/.local/lib/python3.10/site-packages/torch/autograd/function.py", line 301, in apply
    return user_fn(self, *args)
  File "/home/takao/.local/lib/python3.10/site-packages/lietorch/group_ops.py", line 24, in backward
    grad_inputs = cls.backward_op(ctx.group_id, grad, *inputs)
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Could you help me to resolve this problem?

Best,
Takao Kotani

@edexheim
Copy link

Hi,

I ran into a similar issue awhile back where I wanted to use lietorch on multiple GPUs. For my case, including CUDA device guards worked for me (edexheim@02f881d).

Best,
Eric

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants