`test_grad_checkpoint.py` fails if PyTorch is compiled with CUDA support. #6086

ysiraichi · 2023-12-09T15:17:38Z

🐛 Bug

Enabling CUDA support for PyTorch on CI is breaking test_checkpoint.py. I managed to reproduce this issue by modifying the test:

diff --git a/test/test_grad_checkpoint.py b/test/test_grad_checkpoint.py
index 9a5fd19aa..e7d29357b 100644
--- a/test/test_grad_checkpoint.py
+++ b/test/test_grad_checkpoint.py
@@ -4,6 +4,7 @@ import torch_xla.debug.metrics as met
 import torch_xla
 import torch_xla.utils.checkpoint as checkpoint
 
+torch.cuda.init()
 
 def run():
   device = xm.xla_device()

Running test_grad_checkpoint.py test produces the following error:

$ python test/test_grad_checkpoint.py
Traceback (most recent call last):
  File "test/test_grad_checkpoint.py", line 37, in <module>
    run()
  File "test/test_grad_checkpoint.py", line 27, in run
    x = checkpoint.checkpoint(layer, x)
  File "xla/torch_xla/utils/checkpoint.py", line 212, in checkpoint
    return CheckpointFunction.apply(function, preserve, *args)
  File "torch/autograd/function.py", line 553, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
  File "xla/torch_xla/utils/checkpoint.py", line 49, in forward
    ctx.fwd_gpu_devices, ctx.fwd_gpu_states = get_device_states(*args)
  File "torch/utils/checkpoint.py", line 177, in get_device_states
    device_module = _get_device_module(_infer_device_type(*args))
  File "torch/utils/checkpoint.py", line 97, in _get_device_module
    device_module = getattr(torch, device)
  File "torch/__init__.py", line 1927, in __getattr__
    raise AttributeError(f"module '{__name__}' has no attribute '{name}'")
AttributeError: module 'torch' has no attribute 'xla'

Environment

PyTorch/XLA: 402166b

Additional Context

Blocking: #6070

The text was updated successfully, but these errors were encountered:

ysiraichi · 2023-12-09T15:18:59Z

This issue should be fixed in the main PyTorch repo.

lezcano added the xla:gpu label Dec 11, 2023

This was referenced Dec 11, 2023

Failing Torchbench Models: tracking issue #5932

Open

Adapt a few torch.utils.checkpoint functions for PyTorch/XLA. #6178

Merged

ysiraichi closed this as completed in #6178 Jan 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`test_grad_checkpoint.py` fails if PyTorch is compiled with CUDA support. #6086

`test_grad_checkpoint.py` fails if PyTorch is compiled with CUDA support. #6086

ysiraichi commented Dec 9, 2023

ysiraichi commented Dec 9, 2023

test_grad_checkpoint.py fails if PyTorch is compiled with CUDA support. #6086

test_grad_checkpoint.py fails if PyTorch is compiled with CUDA support. #6086

Comments

ysiraichi commented Dec 9, 2023

🐛 Bug

Environment

Additional Context

ysiraichi commented Dec 9, 2023

`test_grad_checkpoint.py` fails if PyTorch is compiled with CUDA support. #6086

`test_grad_checkpoint.py` fails if PyTorch is compiled with CUDA support. #6086