TPU available: true when there are no TPUs #3104

dalmia · 2020-08-22T19:24:01Z

🐛 Bug

I am using a DGX machine (and so, no TPUs), but on initiating Trainer, it logs TPU available: True. This ends up returning Missing XLA configuration when I run my script.

To Reproduce

Code sample

Simply running the following lines on my machine:

>> trainer = pl.Trainer(gpus=[0])                                                                                                                 
GPU available: True, used: True
TPU available: True, using: 0 TPU cores

Expected behavior

>> trainer = pl.Trainer(gpus=[0])                                                                                                                 
GPU available: True, used: True
TPU available: False, using: 0 TPU cores

Environment

* CUDA:
        - GPU:
                - Tesla V100-SXM2-32GB
        - available:         True
        - version:           10.2
* Packages:
        - numpy:             1.18.2
        - pyTorch_debug:     False
        - pyTorch_version:   1.6.0
        - pytorch-lightning: 0.9.0
        - tensorboard:       2.2.0
        - tqdm:              4.45.0
* System:
        - OS:                Linux
        - architecture:
                - 64bit
                - 
        - processor:         x86_64
        - python:            3.6.9
        - version:           #168-Ubuntu SMP Wed Jan 16 21:00:45 UTC 2019

The text was updated successfully, but these errors were encountered:

Borda · 2020-08-24T23:10:11Z

sounds like some misconfiguration issue, are interested in sending a PR? 🐰

dalmia · 2020-08-28T01:39:04Z

Sure. I realized that the bug is in this script.

Specifically:

try:
    import torch_xla.core.xla_model as xm
except ImportError:
    XLA_AVAILABLE = False
else:
    XLA_AVAILABLE = True

So, if the environment has torch_xla installed but no TPU, then this error is thrown. If I use an environment without torch_xla, it works fine. So, is this something that should be fixed in the codebase or something that the user should take care of? @Borda

Borda · 2020-08-28T07:35:19Z

yes, we had the XLA detection as a temporal solution as we did not expect someone would install XLA without having TPU...
so, pls send a PR, I think that we have this patter in several files...

stale · 2020-10-01T18:35:37Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

alimoezzi · 2022-06-10T15:30:30Z

I faced this issue on gcloud vm with gpu but not tpu.

Traceback (most recent call last):
  File "/opt/conda/lib/python3.7/site-packages/pytorch_lightning/utilities/xla_device.py", line 32, in inner_f
    queue.put(func(*args, **kwargs))
  File "/opt/conda/lib/python3.7/site-packages/pytorch_lightning/utilities/xla_device.py", line 73, in _is_device_tpu
    return (xm.xrt_world_size() > 1) or bool(xm.get_xla_supported_devices("TPU"))
  File "/opt/conda/lib/python3.7/site-packages/torch_xla/core/xla_model.py", line 137, in get_xla_supported_devices
    xla_devices = _DEVICES.value
  File "/opt/conda/lib/python3.7/site-packages/torch_xla/utils/utils.py", line 32, in value
    self._value = self._gen_fn()
  File "/opt/conda/lib/python3.7/site-packages/torch_xla/core/xla_model.py", line 19, in <lambda>
    _DEVICES = xu.LazyProperty(lambda: torch_xla._XLAC._xla_get_devices())
RuntimeError: tensorflow/compiler/xla/xla_client/computation_client.cc:273 : Missing XLA configuration

The torch_xla was installed by default in the Deep learning image and the only solution was to manually uninstall torch_xla
I used PL latest version 1.6.4

apezzi · 2022-09-20T07:53:49Z

@realsarm this worked for me, on gcloud with one gpu but not tpu.
On the shell launched pip uninstall torch_xla.

Thanks a lot :)

dalmia added bug Something isn't working help wanted Open to be worked on labels Aug 22, 2020

Borda added the accelerator: tpu Tensor Processing Unit label Aug 24, 2020

lezwon mentioned this issue Aug 30, 2020

Added check to verify xla device is TPU #3274

Merged

7 tasks

edenlightning added this to the 0.9.x milestone Sep 1, 2020

stale bot added the won't fix This will not be worked on label Oct 1, 2020

rohitgr7 removed the won't fix This will not be worked on label Oct 1, 2020

edenlightning modified the milestones: 0.9.x, 1.0 Oct 4, 2020

edenlightning removed the help wanted Open to be worked on label Oct 5, 2020

Borda closed this as completed in #3274 Oct 6, 2020

rohitgr7 mentioned this issue Feb 8, 2022

Add assertions to TPU accelerator for device availability #11799

Closed

11 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TPU available: true when there are no TPUs #3104

TPU available: true when there are no TPUs #3104

dalmia commented Aug 22, 2020

Borda commented Aug 24, 2020

dalmia commented Aug 28, 2020 •

edited

Loading

Borda commented Aug 28, 2020

stale bot commented Oct 1, 2020

alimoezzi commented Jun 10, 2022

apezzi commented Sep 20, 2022

TPU available: true when there are no TPUs #3104

TPU available: true when there are no TPUs #3104

Comments

dalmia commented Aug 22, 2020

🐛 Bug

To Reproduce

Code sample

Expected behavior

Environment

Borda commented Aug 24, 2020

dalmia commented Aug 28, 2020 • edited Loading

Borda commented Aug 28, 2020

stale bot commented Oct 1, 2020

alimoezzi commented Jun 10, 2022

apezzi commented Sep 20, 2022

dalmia commented Aug 28, 2020 •

edited

Loading