Skip to content

Commit

Permalink
Merge branch 'master' into trainer-state-refactor
Browse files Browse the repository at this point in the history
  • Loading branch information
carmocca authored May 4, 2021
2 parents 05cd4fc + a6aa1a0 commit c3f6354
Show file tree
Hide file tree
Showing 27 changed files with 641 additions and 336 deletions.
85 changes: 85 additions & 0 deletions .azure-pipelines/ipu-tests.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
trigger:
tags:
include:
- '*'
branches:
include:
- master
- release/*
- refs/tags/*
pr:
- master
- release/*

variables:
- name: poplar_sdk
value: "poplar_sdk-ubuntu_20_04-2.0.0+481-79b41f85d1"

jobs:
- job: ipu

pool: graphcore-ipus

workspace:
clean: all

steps:
- script: tar -xvzf /opt/poplar/${{ variables.poplar_sdk }}.tar.gz
displayName: "Extract Poplar SDK"

- script: |
set -eux
pip debug --verbose
pip install ${{ variables.poplar_sdk }}/poptorch-*ubuntu*.whl
displayName: "Install poptorch"
- script: |
set -eux
source ${{ variables.poplar_sdk }}/poplar-ubuntu*/enable.sh
NUM_IPUS=$(gc-info --ipu-count)
if [[ -z "${NUM_IPUS}" ]] || [[ "${NUM_IPUS}" -eq 0 ]]; then
echo "No IPUs found to reset. Exiting"
exit 1
fi
echo "Resetting parity on ${NUM_IPUS} IPU devices"
i=0
while [[ i -lt "${NUM_IPUS}" ]]; do
gc-reset -d "${i}"
i=$((i + 1))
done
displayName: "Reset IPU devices"
- bash: |
export GIT_TERMINAL_PROMPT=1
pip install --requirement requirements.txt
python -c "fname = 'requirements/extra.txt' ; lines = [line for line in open(fname).readlines() if 'fairscale' not in line] ; open(fname, 'w').writelines(lines)"
python -c "fname = 'requirements/extra.txt' ; lines = [line for line in open(fname).readlines() if 'horovod' not in line] ; open(fname, 'w').writelines(lines)"
pip install --requirement ./requirements/devel.txt --upgrade-strategy only-if-needed
pip list
displayName: 'Install dependencies'
- bash: |
python tests/collect_env_details.py
python -c "import torch"
displayName: 'Env details'
- script: |
set -eux
source ${{ variables.poplar_sdk }}/poplar-ubuntu*/enable.sh
source ${{ variables.poplar_sdk }}/popart-ubuntu*/enable.sh
python -c "import poptorch; print(poptorch.__version__)"
displayName: "Check poptorch installation"
- bash: |
wget https://pl-public-data.s3.amazonaws.com/legacy/checkpoints.zip -P legacy/
unzip -o legacy/checkpoints.zip -d legacy/
ls -l legacy/checkpoints/
displayName: 'Get legacy checkpoints'
- bash: |
source ${{ variables.poplar_sdk }}/poplar-ubuntu*/enable.sh
source ${{ variables.poplar_sdk }}/popart-ubuntu*/enable.sh
python -m coverage run --source pytorch_lightning -m pytest pytorch_lightning tests -v --junitxml=$(Build.StagingDirectory)/test-results.xml --durations=50
displayName: 'Testing: standard'
2 changes: 1 addition & 1 deletion .github/workflows/ci_dockers.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ on: # Trigger the workflow on push or pull request, but only for the master bran
paths:
- "dockers/**"
- "!dockers/README.md"
- "requirements/*.txt"
- "requirements/*"
- "environment.yml"
- "requirements.txt"
- ".github/workflows/*docker*.yml"
Expand Down
14 changes: 14 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -196,6 +196,9 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
- `LightningModule.from_datasets()` now accepts `IterableDataset` instances as training datasets. ([#7503](https://github.com/PyTorchLightning/pytorch-lightning/pull/7503))


- Changed `resume_from_checkpoint` warning to an error when the checkpoint file does not exist ([#7075](https://github.com/PyTorchLightning/pytorch-lightning/pull/7075))


### Deprecated


Expand Down Expand Up @@ -246,6 +249,10 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
[#6659](https://github.com/PyTorchLightning/pytorch-lightning/pull/6659),
)

- Deprecated the `LightningModule.datamodule` getter and setter methods; access them through `Trainer.datamodule` instead ([#7168](https://github.com/PyTorchLightning/pytorch-lightning/pull/7168))


- Deprecated the use of `Trainer(gpus="i")` (string) for selecting the i-th GPU; from v1.5 this will set the number of GPUs instead of the index ([#6388](https://github.com/PyTorchLightning/pytorch-lightning/pull/6388))

### Removed

Expand Down Expand Up @@ -362,6 +369,9 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
- Fixed bug where `BaseFinetuning.flatten_modules()` was duplicating leaf node parameters ([#6879](https://github.com/PyTorchLightning/pytorch-lightning/pull/6879))


- Fixed bug where the learning rate schedulers did not follow the optimizer frequencies ([#4868](https://github.com/PyTorchLightning/pytorch-lightning/pull/4868))


- Fixed `EarlyStopping` logic when `min_epochs` or `min_steps` requirement is not met ([#6705](https://github.com/PyTorchLightning/pytorch-lightning/pull/6705))


Expand Down Expand Up @@ -455,6 +465,10 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
- Fixed optimizer `state` not moved to `GPU` ([#7277](https://github.com/PyTorchLightning/pytorch-lightning/pull/7277))


- Fixed custom init args for `WandbLogger` ([#6989](https://github.com/PyTorchLightning/pytorch-lightning/pull/6989))



## [1.2.7] - 2021-04-06

### Fixed
Expand Down
21 changes: 9 additions & 12 deletions dockers/base-xla/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ ENV \
DEBIAN_FRONTEND=noninteractive \
CONDA_ENV=lightning

# show system inforation
# show system info
RUN lsb_release -a && cat /etc/*-release

RUN apt-get update -qq && \
Expand All @@ -42,13 +42,13 @@ RUN apt-get update -qq && \
ca-certificates \
libomp5 \
&& \
# Install conda and python.
# NOTE new Conda does not forward the exit status... https://github.com/conda/conda/issues/8385
# Install conda and python.
# NOTE new Conda does not forward the exit status... https://github.com/conda/conda/issues/8385
curl -o ~/miniconda.sh https://repo.anaconda.com/miniconda/Miniconda3-py38_${CONDA_VERSION}-Linux-x86_64.sh && \
chmod +x ~/miniconda.sh && \
~/miniconda.sh -b && \
rm ~/miniconda.sh && \
# Cleaning
# Cleaning
apt-get autoremove -y && \
apt-get clean && \
rm -rf /root/.cache && \
Expand Down Expand Up @@ -79,7 +79,7 @@ ENV \
RUN pip --version && \
pip config set global.cache-dir false && \
conda remove pytorch torchvision && \
# Install Pytorch XLA
# Install Pytorch XLA
py_version=${PYTHON_VERSION/./} && \
# Python 3.7 wheels are available. Replace cp36-cp36m with cp37-cp37m
gsutil cp "gs://tpu-pytorch/wheels/torch-${XLA_VERSION}-cp${py_version}-cp${py_version}m-linux_x86_64.whl" . && \
Expand All @@ -91,20 +91,17 @@ RUN pip --version && \
# Get package
COPY ./ ./pytorch-lightning/

# Install pytorch-lightning dependencies.
RUN \
python --version && \
# Install PL dependencies
cd pytorch-lightning && \
# drop Torch as it was installed with XLA
# drop packages installed with XLA
python -c "fname = 'requirements.txt' ; lines = [line for line in open(fname).readlines() if not line.startswith('torch')] ; open(fname, 'w').writelines(lines)" && \
# drop Horovod as it is not needed
python -c "fname = 'requirements/examples.txt' ; lines = [line for line in open(fname).readlines() if not line.startswith('torchvision')] ; open(fname, 'w').writelines(lines)" && \
# drop unnecessary packages
python -c "fname = 'requirements/extra.txt' ; lines = [line for line in open(fname).readlines() if not line.startswith('horovod')] ; open(fname, 'w').writelines(lines)" && \
# drop fairscale as it is not needed
python -c "fname = 'requirements/extra.txt' ; lines = [line for line in open(fname).readlines() if 'fairscale' not in line] ; open(fname, 'w').writelines(lines)" && \
# drop TorchVision as it was installed with XLA
python -c "fname = 'requirements/examples.txt' ; lines = [line for line in open(fname).readlines() if not line.startswith('torchvision')] ; open(fname, 'w').writelines(lines)" && \
python ./requirements/adjust_versions.py ./requirements/extra.txt && \
# install PL dependencies
pip install --requirement ./requirements/devel.txt --no-cache-dir && \
cd .. && \
rm -rf pytorch-lightning && \
Expand Down
6 changes: 5 additions & 1 deletion docs/source/advanced/multi_gpu.rst
Original file line number Diff line number Diff line change
Expand Up @@ -226,13 +226,17 @@ Note in particular the difference between `gpus=0`, `gpus=[0]` and `gpus="0"`.
+---------------+-----------+---------------------+---------------------------------+
| "0" | str | [0] | GPU 0 |
+---------------+-----------+---------------------+---------------------------------+
| "3" | str | [3] | GPU 3 |
| "3" | str | [3] | GPU 3 (will change in v1.5) |
+---------------+-----------+---------------------+---------------------------------+
| "1, 3" | str | [1, 3] | GPUs 1 and 3 |
+---------------+-----------+---------------------+---------------------------------+
| "-1" | str | [0, 1, 2, ...] | all available GPUs |
+---------------+-----------+---------------------+---------------------------------+

.. warning::
The behavior for :code:`gpus="3"` (str) will change. Currently it selects the GPU with index 3, but will
select the first 3 GPUs from v1.5.

.. note::

When specifying number of gpus as an integer ``gpus=k``, setting the trainer flag
Expand Down
2 changes: 1 addition & 1 deletion docs/source/common/lightning_module.rst
Original file line number Diff line number Diff line change
Expand Up @@ -994,7 +994,7 @@ Set or access your datamodule.
.. code-block:: python
def configure_optimizers(self):
num_training_samples = len(self.datamodule.train_dataloader())
num_training_samples = len(self.trainer.datamodule.train_dataloader())
...
--------------
Expand Down
2 changes: 1 addition & 1 deletion docs/source/common/trainer.rst
Original file line number Diff line number Diff line change
Expand Up @@ -870,7 +870,7 @@ logger

|
:doc:`Logger <../common/loggers>` (or iterable collection of loggers) for experiment tracking.
:doc:`Logger <../common/loggers>` (or iterable collection of loggers) for experiment tracking. A ``True`` value uses the default ``TensorBoardLogger`` shown below. ``False`` will disable logging.

.. testcode::

Expand Down
134 changes: 60 additions & 74 deletions pytorch_lightning/core/datamodule.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,80 +24,7 @@
from pytorch_lightning.utilities.argparse import add_argparse_args, from_argparse_args, get_init_arguments_and_types


class _DataModuleWrapper(type):

def __init__(self, *args: Any, **kwargs: Any) -> None:
super().__init__(*args, **kwargs)
self.__has_added_checks = False

def __call__(cls, *args, **kwargs):
"""A wrapper for LightningDataModule that:
1. Runs user defined subclass's __init__
2. Assures prepare_data() runs on rank 0
3. Lets you check prepare_data and setup to see if they've been called
"""
if not cls.__has_added_checks:
cls.__has_added_checks = True
# Track prepare_data calls and make sure it runs on rank zero
cls.prepare_data = track_data_hook_calls(rank_zero_only(cls.prepare_data))
# Track setup calls
cls.setup = track_data_hook_calls(cls.setup)
# Track teardown calls
cls.teardown = track_data_hook_calls(cls.teardown)

# Get instance of LightningDataModule by mocking its __init__ via __call__
obj = type.__call__(cls, *args, **kwargs)

return obj


def track_data_hook_calls(fn):
"""A decorator that checks if prepare_data/setup/teardown has been called.
- When ``dm.prepare_data()`` is called, ``dm.has_prepared_data`` gets set to True
- When ``dm.setup()``, ``dm.has_setup_{fit,validate,test}`` get set to True
- When ``dm.setup(stage)`` is called, where stage is any of ``{fit,validate,test,predict}``.
Its corresponding `dm_has_setup_{stage}` attribute gets set to True
- ``dm.teardown()`` and ``dm.teardown(stage)`` act exactly like ``dm.setup``
Args:
fn (function): Function that will be tracked to see if it has been called.
Returns:
function: Decorated function that tracks its call status and saves it to private attrs in its obj instance.
"""

@functools.wraps(fn)
def wrapped_fn(*args, **kwargs):

# The object instance from which setup or prepare_data was called
obj = args[0]
name = fn.__name__

# If calling setup, we check the stage and assign stage-specific bool args
if name in ("setup", "teardown"):

# Get stage either by grabbing from args or checking kwargs.
# If not provided, set call status of 'fit', 'validate', and 'test' to True.
# We do this so __attach_datamodule in trainer.py doesn't mistakenly call setup('test') on trainer.test()
stage = args[1] if len(args) > 1 else kwargs.get("stage", None)

if stage is None:
for s in ("fit", "validate", "test"):
setattr(obj, f"_has_{name}_{s}", True)
else:
setattr(obj, f"_has_{name}_{stage}", True)

elif name == "prepare_data":
obj._has_prepared_data = True

return fn(*args, **kwargs)

return wrapped_fn


class LightningDataModule(CheckpointHooks, DataHooks, metaclass=_DataModuleWrapper):
class LightningDataModule(CheckpointHooks, DataHooks):
"""
A DataModule standardizes the training, val, test splits, data preparation and transforms.
The main advantage is consistent data splits, data preparation and transforms across models.
Expand Down Expand Up @@ -398,3 +325,62 @@ def test_dataloader():
if test_dataset is not None:
datamodule.test_dataloader = test_dataloader
return datamodule

def __new__(cls, *args: Any, **kwargs: Any) -> 'LightningDataModule':
obj = super().__new__(cls)
# track `DataHooks` calls and run `prepare_data` only on rank zero
obj.prepare_data = cls._track_data_hook_calls(obj, rank_zero_only(obj.prepare_data))
obj.setup = cls._track_data_hook_calls(obj, obj.setup)
obj.teardown = cls._track_data_hook_calls(obj, obj.teardown)
return obj

@staticmethod
def _track_data_hook_calls(obj: 'LightningDataModule', fn: callable) -> callable:
"""A decorator that checks if prepare_data/setup/teardown has been called.
- When ``dm.prepare_data()`` is called, ``dm.has_prepared_data`` gets set to True
- When ``dm.setup()``, ``dm.has_setup_{fit,validate,test}`` get set to True
- When ``dm.setup(stage)`` is called, where stage is any of ``{fit,validate,test,predict}``.
Its corresponding `dm_has_setup_{stage}` attribute gets set to True
- ``dm.teardown()`` and ``dm.teardown(stage)`` act exactly like ``dm.setup``
Args:
obj: Object whose function will be tracked
fn: Function that will be tracked to see if it has been called.
Returns:
Decorated function that tracks its call status and saves it to private attrs in its obj instance.
"""

@functools.wraps(fn)
def wrapped_fn(*args: str, **kwargs: Optional[str]) -> Any:
name = fn.__name__

# If calling setup, we check the stage and assign stage-specific bool args
if name in ("setup", "teardown"):

# Get stage either by grabbing from args or checking kwargs.
# If not provided, set call status of 'fit', 'validate', and 'test' to True.
# We do this so __attach_datamodule in trainer.py doesn't mistakenly call
# setup('test') on trainer.test()
stage = args[0] if len(args) else kwargs.get("stage", None)

if stage is None:
for s in ("fit", "validate", "test"):
setattr(obj, f"_has_{name}_{s}", True)
else:
setattr(obj, f"_has_{name}_{stage}", True)

elif name == "prepare_data":
obj._has_prepared_data = True

return fn(*args, **kwargs)

return wrapped_fn

def __getstate__(self) -> dict:
# avoids _pickle.PicklingError: Can't pickle <...>: it's not the same object as <...>
d = self.__dict__.copy()
for fn in ("prepare_data", "setup", "teardown"):
del d[fn]
return d
Loading

0 comments on commit c3f6354

Please sign in to comment.