Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pytorch lightning tpu on kaggle kernel error #10726

Closed
germayneng opened this issue Nov 24, 2021 · 5 comments · Fixed by #10836
Closed

pytorch lightning tpu on kaggle kernel error #10726

germayneng opened this issue Nov 24, 2021 · 5 comments · Fixed by #10836
Assignees
Labels
accelerator: tpu Tensor Processing Unit bug Something isn't working priority: 0 High priority task

Comments

@germayneng
Copy link

germayneng commented Nov 24, 2021

🐛 Bug

Hi - i am not able to get pytorch lightning working with TPU v3.8 on kaggle. I have followed all the instructions as well as kaggle kernel examples and it does not seem to work - at least for pytorch lightning v1.4

Error when importing lightning is as follows:

---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
/tmp/ipykernel_47/4005941092.py in <module>
----> 1 import pytorch_lightning as pl

/opt/conda/lib/python3.7/site-packages/pytorch_lightning/__init__.py in <module>
     19 
     20 from pytorch_lightning import metrics  # noqa: E402
---> 21 from pytorch_lightning.callbacks import Callback  # noqa: E402
     22 from pytorch_lightning.core import LightningDataModule, LightningModule  # noqa: E402
     23 from pytorch_lightning.trainer import Trainer  # noqa: E402

/opt/conda/lib/python3.7/site-packages/pytorch_lightning/callbacks/__init__.py in <module>
     22 from pytorch_lightning.callbacks.prediction_writer import BasePredictionWriter
     23 from pytorch_lightning.callbacks.progress import ProgressBar, ProgressBarBase
---> 24 from pytorch_lightning.callbacks.pruning import ModelPruning
     25 from pytorch_lightning.callbacks.quantization import QuantizationAwareTraining
     26 from pytorch_lightning.callbacks.stochastic_weight_avg import StochasticWeightAveraging

/opt/conda/lib/python3.7/site-packages/pytorch_lightning/callbacks/pruning.py in <module>
     29 import pytorch_lightning as pl
     30 from pytorch_lightning.callbacks.base import Callback
---> 31 from pytorch_lightning.core.lightning import LightningModule
     32 from pytorch_lightning.utilities.apply_func import apply_to_collection
     33 from pytorch_lightning.utilities.distributed import rank_zero_debug, rank_zero_only

/opt/conda/lib/python3.7/site-packages/pytorch_lightning/core/__init__.py in <module>
     14 
     15 from pytorch_lightning.core.datamodule import LightningDataModule
---> 16 from pytorch_lightning.core.lightning import LightningModule
     17 
     18 __all__ = ["LightningDataModule", "LightningModule"]

/opt/conda/lib/python3.7/site-packages/pytorch_lightning/core/lightning.py in <module>
     39 from pytorch_lightning.core.optimizer import LightningOptimizer
     40 from pytorch_lightning.core.saving import ModelIO
---> 41 from pytorch_lightning.trainer.connectors.logger_connector.fx_validator import FxValidator
     42 from pytorch_lightning.utilities import rank_zero_deprecation, rank_zero_warn
     43 from pytorch_lightning.utilities.apply_func import apply_to_collection, convert_to_tensors

/opt/conda/lib/python3.7/site-packages/pytorch_lightning/trainer/__init__.py in <module>
     16 """
     17 
---> 18 from pytorch_lightning.trainer.trainer import Trainer
     19 from pytorch_lightning.utilities.seed import seed_everything
     20 

/opt/conda/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py in <module>
     36 from pytorch_lightning.plugins import Plugin
     37 from pytorch_lightning.plugins.environments import ClusterEnvironment
---> 38 from pytorch_lightning.profiler import (
     39     AdvancedProfiler,
     40     BaseProfiler,

/opt/conda/lib/python3.7/site-packages/pytorch_lightning/profiler/__init__.py in <module>
    201 from pytorch_lightning.profiler.pytorch import PyTorchProfiler
    202 from pytorch_lightning.profiler.simple import SimpleProfiler
--> 203 from pytorch_lightning.profiler.xla import XLAProfiler
    204 
    205 __all__ = [

/opt/conda/lib/python3.7/site-packages/pytorch_lightning/profiler/xla.py in <module>
     48 
     49 if _TPU_AVAILABLE:
---> 50     import torch_xla.debug.profiler as xp
     51 
     52 log = logging.getLogger(__name__)

ModuleNotFoundError: No module named 'torch_xla.debug.profiler'

To Reproduce

I am meeting an error when i am trying to import the pytorch lightning module.

To reproduce - start a tpu kaggle kernel notebook and run:


! curl https://raw.githubusercontent.com/pytorch/xla/master/contrib/scripts/env-setup.py -o pytorch-xla-env-setup.py
! python pytorch-xla-env-setup.py --version 1.7 --apt-packages libomp5 libopenblas-dev
! pip install pytorch-lightning==1.4.0

Expected behavior

To be able to load pytorch lightning with trainer and tpu_cores enabled for training

Environment

  • PyTorch Lightning Version 1.4.0
  • PyTorch Version (e.g., 1.10): 1.7.0a0 / XLA: 1.7
  • Python version (e.g., 3.9): 3.7.10
  • OS (e.g., Linux): n/a
  • CUDA/cuDNN version: cuda11
  • GPU models and configuration:
  • How you installed PyTorch (conda, pip, source): pip
  • If compiling from source, the output of torch.__config__.show():
  • Any other relevant information:

cc @kaushikb11 @tchaton

@germayneng germayneng added the bug Something isn't working label Nov 24, 2021
@germayneng germayneng changed the title pytorch lightning tpu on kaggle kernel pytorch lightning tpu on kaggle kernel error Nov 24, 2021
@rohitgr7 rohitgr7 added the accelerator: tpu Tensor Processing Unit label Nov 24, 2021
@tchaton tchaton added the priority: 1 Medium priority task label Nov 24, 2021
@tchaton
Copy link
Contributor

tchaton commented Nov 24, 2021

Dear @germayneng,

Would you mind updating to the latest version of Lightning ?

@tchaton
Copy link
Contributor

tchaton commented Nov 24, 2021

@kaushikb11

@germayneng
Copy link
Author

still getting the same error with lightning at 1.5.3

ModuleNotFoundError: No module named 'torch_xla.debug.profiler'

I am installing i believe 1.7 of torch XLA - is this correct?

!curl https://raw.githubusercontent.com/pytorch/xla/master/contrib/scripts/env-setup.py -o pytorch-xla-env-setup.py
!python pytorch-xla-env-setup.py --version 1.7 --apt-packages libomp5 libopenblas-dev

@tchaton tchaton added priority: 0 High priority task and removed priority: 1 Medium priority task labels Nov 29, 2021
@germayneng
Copy link
Author

Hi any updates on this?

@githendumukiri
Copy link

any updates?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accelerator: tpu Tensor Processing Unit bug Something isn't working priority: 0 High priority task
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants