Merge branch 'master' into trainer-state-refactor

Lightning-AI · May 4, 2021 · c3f6354 · c3f6354
2 parents 05cd4fc + a6aa1a0
commit c3f6354
Show file tree

Hide file tree

Showing 27 changed files with 641 additions and 336 deletions.
diff --git a/.azure-pipelines/ipu-tests.yml b/.azure-pipelines/ipu-tests.yml
@@ -0,0 +1,85 @@
+trigger:
+  tags:
+    include:
+      - '*'
+  branches:
+    include:
+      - master
+      - release/*
+      - refs/tags/*
+pr:
+  - master
+  - release/*
+
+variables:
+- name: poplar_sdk
+  value: "poplar_sdk-ubuntu_20_04-2.0.0+481-79b41f85d1"
+
+jobs:
+  - job: ipu
+
+    pool: graphcore-ipus
+
+    workspace:
+      clean: all
+
+    steps:
+    - script: tar -xvzf /opt/poplar/${{ variables.poplar_sdk }}.tar.gz
+      displayName: "Extract Poplar SDK"
+
+    - script: |
+        set -eux
+        pip debug --verbose
+        pip install ${{ variables.poplar_sdk }}/poptorch-*ubuntu*.whl
+      displayName: "Install poptorch"
+
+    - script: |
+        set -eux
+        source ${{ variables.poplar_sdk }}/poplar-ubuntu*/enable.sh
+        NUM_IPUS=$(gc-info --ipu-count)
+        if [[ -z "${NUM_IPUS}" ]] || [[ "${NUM_IPUS}" -eq 0 ]]; then
+            echo "No IPUs found to reset. Exiting"
+            exit 1
+        fi
+        echo "Resetting parity on ${NUM_IPUS} IPU devices"
+        i=0
+        while [[ i -lt  "${NUM_IPUS}" ]]; do
+            gc-reset -d "${i}"
+            i=$((i + 1))
+        done
+      displayName: "Reset IPU devices"
+
+    - bash: |
+        export GIT_TERMINAL_PROMPT=1
+        pip install --requirement requirements.txt
+        python -c "fname = 'requirements/extra.txt' ; lines = [line for line in open(fname).readlines() if 'fairscale' not in line] ; open(fname, 'w').writelines(lines)"
+        python -c "fname = 'requirements/extra.txt' ; lines = [line for line in open(fname).readlines() if 'horovod' not in line] ; open(fname, 'w').writelines(lines)"
+        pip install --requirement ./requirements/devel.txt --upgrade-strategy only-if-needed
+        pip list
+      displayName: 'Install dependencies'
+
+    - bash: |
+        python tests/collect_env_details.py
+        python -c "import torch"
+      displayName: 'Env details'
+
+    - script: |
+        set -eux
+        source ${{ variables.poplar_sdk }}/poplar-ubuntu*/enable.sh
+        source ${{ variables.poplar_sdk }}/popart-ubuntu*/enable.sh
+
+        python -c "import poptorch; print(poptorch.__version__)"
+      displayName: "Check poptorch installation"
+
+    - bash: |
+        wget https://pl-public-data.s3.amazonaws.com/legacy/checkpoints.zip -P legacy/
+        unzip -o legacy/checkpoints.zip -d legacy/
+        ls -l legacy/checkpoints/
+      displayName: 'Get legacy checkpoints'
+
+    - bash: |
+        source ${{ variables.poplar_sdk }}/poplar-ubuntu*/enable.sh
+        source ${{ variables.poplar_sdk }}/popart-ubuntu*/enable.sh
+
+        python -m coverage run --source pytorch_lightning -m pytest pytorch_lightning tests -v --junitxml=$(Build.StagingDirectory)/test-results.xml --durations=50
+      displayName: 'Testing: standard'
diff --git a/.github/workflows/ci_dockers.yml b/.github/workflows/ci_dockers.yml
@@ -10,7 +10,7 @@ on: # Trigger the workflow on push or pull request, but only for the master bran
     paths:
       - "dockers/**"
       - "!dockers/README.md"
-      - "requirements/*.txt"
+      - "requirements/*"
       - "environment.yml"
       - "requirements.txt"
       - ".github/workflows/*docker*.yml"

diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -196,6 +196,9 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
 - `LightningModule.from_datasets()` now accepts `IterableDataset` instances as training datasets. ([#7503](https://github.com/PyTorchLightning/pytorch-lightning/pull/7503))
 
 
+- Changed `resume_from_checkpoint` warning to an error when the checkpoint file does not exist ([#7075](https://github.com/PyTorchLightning/pytorch-lightning/pull/7075))
+
+
 ### Deprecated
 
 
@@ -246,6 +249,10 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
     [#6659](https://github.com/PyTorchLightning/pytorch-lightning/pull/6659),
 )
 
+- Deprecated the `LightningModule.datamodule` getter and setter methods; access them through `Trainer.datamodule` instead ([#7168](https://github.com/PyTorchLightning/pytorch-lightning/pull/7168))
+
+
+- Deprecated the use of `Trainer(gpus="i")` (string) for selecting the i-th GPU; from v1.5 this will set the number of GPUs instead of the index ([#6388](https://github.com/PyTorchLightning/pytorch-lightning/pull/6388))
 
 ### Removed
 
@@ -362,6 +369,9 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
 - Fixed bug where `BaseFinetuning.flatten_modules()` was duplicating leaf node parameters ([#6879](https://github.com/PyTorchLightning/pytorch-lightning/pull/6879))
 
 
+- Fixed bug where the learning rate schedulers did not follow the optimizer frequencies ([#4868](https://github.com/PyTorchLightning/pytorch-lightning/pull/4868))
+
+
 - Fixed `EarlyStopping` logic when `min_epochs` or `min_steps` requirement is not met ([#6705](https://github.com/PyTorchLightning/pytorch-lightning/pull/6705))
 
 
@@ -455,6 +465,10 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
 - Fixed optimizer `state` not moved to `GPU` ([#7277](https://github.com/PyTorchLightning/pytorch-lightning/pull/7277))
 
 
+- Fixed custom init args for `WandbLogger` ([#6989](https://github.com/PyTorchLightning/pytorch-lightning/pull/6989))
+
+
+
 ## [1.2.7] - 2021-04-06
 
 ### Fixed

diff --git a/dockers/base-xla/Dockerfile b/dockers/base-xla/Dockerfile
@@ -29,7 +29,7 @@ ENV \
     DEBIAN_FRONTEND=noninteractive \
     CONDA_ENV=lightning
 
-# show system inforation
+# show system info
 RUN lsb_release -a && cat /etc/*-release
 
 RUN apt-get update -qq && \
@@ -42,13 +42,13 @@ RUN apt-get update -qq && \
         ca-certificates \
         libomp5 \
     && \
-# Install conda and python.
-# NOTE new Conda does not forward the exit status... https://github.com/conda/conda/issues/8385
+    # Install conda and python.
+    # NOTE new Conda does not forward the exit status... https://github.com/conda/conda/issues/8385
     curl -o ~/miniconda.sh https://repo.anaconda.com/miniconda/Miniconda3-py38_${CONDA_VERSION}-Linux-x86_64.sh && \
     chmod +x ~/miniconda.sh && \
     ~/miniconda.sh -b && \
     rm ~/miniconda.sh && \
-# Cleaning
+    # Cleaning
     apt-get autoremove -y && \
     apt-get clean && \
     rm -rf /root/.cache && \
@@ -79,7 +79,7 @@ ENV \
 RUN pip --version && \
     pip config set global.cache-dir false && \
     conda remove pytorch torchvision && \
-# Install Pytorch XLA
+    # Install Pytorch XLA
     py_version=${PYTHON_VERSION/./} && \
     # Python 3.7 wheels are available. Replace cp36-cp36m with cp37-cp37m
     gsutil cp "gs://tpu-pytorch/wheels/torch-${XLA_VERSION}-cp${py_version}-cp${py_version}m-linux_x86_64.whl" . && \
@@ -91,20 +91,17 @@ RUN pip --version && \
 # Get package
 COPY ./ ./pytorch-lightning/
 
-# Install pytorch-lightning dependencies.
 RUN \
     python --version && \
-# Install PL dependencies
     cd pytorch-lightning && \
-    # drop Torch as it was installed with XLA
+    # drop packages installed with XLA
     python -c "fname = 'requirements.txt' ; lines = [line for line in open(fname).readlines() if not line.startswith('torch')] ; open(fname, 'w').writelines(lines)" && \
-    # drop Horovod as it is not needed
+    python -c "fname = 'requirements/examples.txt' ; lines = [line for line in open(fname).readlines() if not line.startswith('torchvision')] ; open(fname, 'w').writelines(lines)" && \
+    # drop unnecessary packages
     python -c "fname = 'requirements/extra.txt' ; lines = [line for line in open(fname).readlines() if not line.startswith('horovod')] ; open(fname, 'w').writelines(lines)" && \
-    # drop fairscale as it is not needed
     python -c "fname = 'requirements/extra.txt' ; lines = [line for line in open(fname).readlines() if 'fairscale' not in line] ; open(fname, 'w').writelines(lines)" && \
-    # drop TorchVision as it was installed with XLA
-    python -c "fname = 'requirements/examples.txt' ; lines = [line for line in open(fname).readlines() if not line.startswith('torchvision')] ; open(fname, 'w').writelines(lines)" && \
     python ./requirements/adjust_versions.py ./requirements/extra.txt && \
+    # install PL dependencies
     pip install --requirement ./requirements/devel.txt --no-cache-dir && \
     cd .. && \
     rm -rf pytorch-lightning && \

diff --git a/docs/source/advanced/multi_gpu.rst b/docs/source/advanced/multi_gpu.rst
@@ -226,13 +226,17 @@ Note in particular the difference between `gpus=0`, `gpus=[0]` and `gpus="0"`.
 +---------------+-----------+---------------------+---------------------------------+
 | "0"           | str       | [0]                 | GPU 0                           |
 +---------------+-----------+---------------------+---------------------------------+
-| "3"           | str       | [3]                 | GPU 3                           |
+| "3"           | str       | [3]                 | GPU 3 (will change in v1.5)     |
 +---------------+-----------+---------------------+---------------------------------+
 | "1, 3"        | str       | [1, 3]              | GPUs 1 and 3                    |
 +---------------+-----------+---------------------+---------------------------------+
 | "-1"          | str       | [0, 1, 2, ...]      | all available GPUs              |
 +---------------+-----------+---------------------+---------------------------------+
 
+.. warning::
+    The behavior for :code:`gpus="3"` (str) will change. Currently it selects the GPU with index 3, but will
+    select the first 3 GPUs from v1.5.
+
 .. note::
 
     When specifying number of gpus as an integer ``gpus=k``, setting the trainer flag

diff --git a/docs/source/common/lightning_module.rst b/docs/source/common/lightning_module.rst
@@ -994,7 +994,7 @@ Set or access your datamodule.
 .. code-block:: python
 
     def configure_optimizers(self):
-        num_training_samples = len(self.datamodule.train_dataloader())
+        num_training_samples = len(self.trainer.datamodule.train_dataloader())
         ...
 
 --------------

diff --git a/docs/source/common/trainer.rst b/docs/source/common/trainer.rst
@@ -870,7 +870,7 @@ logger
 
 |
 
-:doc:`Logger <../common/loggers>` (or iterable collection of loggers) for experiment tracking.
+:doc:`Logger <../common/loggers>` (or iterable collection of loggers) for experiment tracking. A ``True`` value uses the default ``TensorBoardLogger`` shown below. ``False`` will disable logging.
 
 .. testcode::
 

diff --git a/pytorch_lightning/core/datamodule.py b/pytorch_lightning/core/datamodule.py
@@ -24,80 +24,7 @@
 from pytorch_lightning.utilities.argparse import add_argparse_args, from_argparse_args, get_init_arguments_and_types
 
 
-class _DataModuleWrapper(type):
-
-    def __init__(self, *args: Any, **kwargs: Any) -> None:
-        super().__init__(*args, **kwargs)
-        self.__has_added_checks = False
-
-    def __call__(cls, *args, **kwargs):
-        """A wrapper for LightningDataModule that:
-
-        1. Runs user defined subclass's __init__
-        2. Assures prepare_data() runs on rank 0
-        3. Lets you check prepare_data and setup to see if they've been called
-        """
-        if not cls.__has_added_checks:
-            cls.__has_added_checks = True
-            # Track prepare_data calls and make sure it runs on rank zero
-            cls.prepare_data = track_data_hook_calls(rank_zero_only(cls.prepare_data))
-            # Track setup calls
-            cls.setup = track_data_hook_calls(cls.setup)
-            # Track teardown calls
-            cls.teardown = track_data_hook_calls(cls.teardown)
-
-        # Get instance of LightningDataModule by mocking its __init__ via __call__
-        obj = type.__call__(cls, *args, **kwargs)
-
-        return obj
-
-
-def track_data_hook_calls(fn):
-    """A decorator that checks if prepare_data/setup/teardown has been called.
-
-    - When ``dm.prepare_data()`` is called, ``dm.has_prepared_data`` gets set to True
-    - When ``dm.setup()``, ``dm.has_setup_{fit,validate,test}`` get set to True
-    - When ``dm.setup(stage)`` is called, where stage is any of ``{fit,validate,test,predict}``.
-      Its corresponding `dm_has_setup_{stage}` attribute gets set to True
-    - ``dm.teardown()`` and ``dm.teardown(stage)`` act exactly like ``dm.setup``
-
-    Args:
-        fn (function): Function that will be tracked to see if it has been called.
-
-    Returns:
-        function: Decorated function that tracks its call status and saves it to private attrs in its obj instance.
-    """
-
-    @functools.wraps(fn)
-    def wrapped_fn(*args, **kwargs):
-
-        # The object instance from which setup or prepare_data was called
-        obj = args[0]
-        name = fn.__name__
-
-        # If calling setup, we check the stage and assign stage-specific bool args
-        if name in ("setup", "teardown"):
-
-            # Get stage either by grabbing from args or checking kwargs.
-            # If not provided, set call status of 'fit', 'validate', and 'test' to True.
-            # We do this so __attach_datamodule in trainer.py doesn't mistakenly call setup('test') on trainer.test()
-            stage = args[1] if len(args) > 1 else kwargs.get("stage", None)
-
-            if stage is None:
-                for s in ("fit", "validate", "test"):
-                    setattr(obj, f"_has_{name}_{s}", True)
-            else:
-                setattr(obj, f"_has_{name}_{stage}", True)
-
-        elif name == "prepare_data":
-            obj._has_prepared_data = True
-
-        return fn(*args, **kwargs)
-
-    return wrapped_fn
-
-
-class LightningDataModule(CheckpointHooks, DataHooks, metaclass=_DataModuleWrapper):
+class LightningDataModule(CheckpointHooks, DataHooks):
     """
     A DataModule standardizes the training, val, test splits, data preparation and transforms.
     The main advantage is consistent data splits, data preparation and transforms across models.
@@ -398,3 +325,62 @@ def test_dataloader():
         if test_dataset is not None:
             datamodule.test_dataloader = test_dataloader
         return datamodule
+
+    def __new__(cls, *args: Any, **kwargs: Any) -> 'LightningDataModule':
+        obj = super().__new__(cls)
+        # track `DataHooks` calls and run `prepare_data` only on rank zero
+        obj.prepare_data = cls._track_data_hook_calls(obj, rank_zero_only(obj.prepare_data))
+        obj.setup = cls._track_data_hook_calls(obj, obj.setup)
+        obj.teardown = cls._track_data_hook_calls(obj, obj.teardown)
+        return obj
+
+    @staticmethod
+    def _track_data_hook_calls(obj: 'LightningDataModule', fn: callable) -> callable:
+        """A decorator that checks if prepare_data/setup/teardown has been called.
+
+        - When ``dm.prepare_data()`` is called, ``dm.has_prepared_data`` gets set to True
+        - When ``dm.setup()``, ``dm.has_setup_{fit,validate,test}`` get set to True
+        - When ``dm.setup(stage)`` is called, where stage is any of ``{fit,validate,test,predict}``.
+          Its corresponding `dm_has_setup_{stage}` attribute gets set to True
+        - ``dm.teardown()`` and ``dm.teardown(stage)`` act exactly like ``dm.setup``
+
+        Args:
+            obj: Object whose function will be tracked
+            fn: Function that will be tracked to see if it has been called.
+
+        Returns:
+            Decorated function that tracks its call status and saves it to private attrs in its obj instance.
+        """
+
+        @functools.wraps(fn)
+        def wrapped_fn(*args: str, **kwargs: Optional[str]) -> Any:
+            name = fn.__name__
+
+            # If calling setup, we check the stage and assign stage-specific bool args
+            if name in ("setup", "teardown"):
+
+                # Get stage either by grabbing from args or checking kwargs.
+                # If not provided, set call status of 'fit', 'validate', and 'test' to True.
+                # We do this so __attach_datamodule in trainer.py doesn't mistakenly call
+                # setup('test') on trainer.test()
+                stage = args[0] if len(args) else kwargs.get("stage", None)
+
+                if stage is None:
+                    for s in ("fit", "validate", "test"):
+                        setattr(obj, f"_has_{name}_{s}", True)
+                else:
+                    setattr(obj, f"_has_{name}_{stage}", True)
+
+            elif name == "prepare_data":
+                obj._has_prepared_data = True
+
+            return fn(*args, **kwargs)
+
+        return wrapped_fn
+
+    def __getstate__(self) -> dict:
+        # avoids _pickle.PicklingError: Can't pickle <...>: it's not the same object as <...>
+        d = self.__dict__.copy()
+        for fn in ("prepare_data", "setup", "teardown"):
+            del d[fn]
+        return d
-Original file line number
+Diff line change
@@ Expand Up / @@ -870,7 +870,7 @@ logger @@
     |
-    :doc:`Logger <../common/loggers>` (or iterable collection of loggers) for experiment tracking.
+    :doc:`Logger <../common/loggers>` (or iterable collection of loggers) for experiment tracking. A ``True`` value uses the default ``TensorBoardLogger`` shown below. ``False`` will disable logging.
     .. testcode::
@@ Expand Down @@