Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update docs #1656

Merged
merged 4 commits into from
Apr 29, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
57 changes: 19 additions & 38 deletions docs/source/fast_training.rst
Original file line number Diff line number Diff line change
Expand Up @@ -42,45 +42,26 @@ Must use an int if using an IterableDataset.
# check every 100 train batches (ie: for IterableDatasets or fixed frequency)
trainer = Trainer(val_check_interval=100)

Use training data subset
------------------------
If you don't want to check 100% of the training set (for debugging or if it's huge), set this flag.
Use data subset for training, validation and test
-------------------------------------------------
If you don't want to check 100% of the training/validation/test set (for debugging or if it's huge), set these flags.

.. code-block:: python

# DEFAULT
trainer = Trainer(train_percent_check=1.0)

# check 10% only
trainer = Trainer(train_percent_check=0.1)

.. note:: ``train_percent_check`` will be overwritten by ``overfit_pct`` if ``overfit_pct`` > 0.

Use test data subset
--------------------
If you don't want to check 100% of the test set (for debugging or if it's huge), set this flag.

.. code-block:: python

# DEFAULT
trainer = Trainer(test_percent_check=1.0)

# check 10% only
trainer = Trainer(test_percent_check=0.1)

.. note:: ``test_percent_check`` will be overwritten by ``overfit_pct`` if ``overfit_pct`` > 0.

Use validation data subset
--------------------------
If you don't want to check 100% of the validation set (for debugging or if it's huge), set this flag.

.. code-block:: python

# DEFAULT
trainer = Trainer(val_percent_check=1.0)

# check 10% only
trainer = Trainer(val_percent_check=0.1)

.. note:: ``val_percent_check`` will be overwritten by ``overfit_pct`` if ``overfit_pct`` > 0 and ignored if
``fast_dev_run=True``.
trainer = Trainer(
train_percent_check=1.0,
val_percent_check=1.0,
test_percent_check=1.0
)

# check 10%, 20%, 30% only, respectively for training, validation and test set
trainer = Trainer(
train_percent_check=0.1,
val_percent_check=0.2,
test_percent_check=0.3
)

.. note:: ``train_percent_check``, ``val_percent_check`` and ``test_percent_check`` will be overwritten by ``overfit_pct`` if ``overfit_pct`` > 0. ``val_percent_check`` will be ignored if ``fast_dev_run=True``.

.. note:: If you set ``val_percent_check=0``, validation will be disabled.
9 changes: 4 additions & 5 deletions docs/source/slurm.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,17 +11,14 @@ To train a model using multiple-nodes do the following:

1. Design your LightningModule.

2. Add `torch.DistributedSampler <https://pytorch.org/docs/stable/data.html#torch.utils.data.distributed.DistributedSampler>`_
which enables access to a subset of your full dataset to each GPU.

3. Enable ddp in the trainer
2. Enable ddp in the trainer

.. code-block:: python

# train on 32 GPUs across 4 nodes
trainer = Trainer(gpus=8, num_nodes=4, distributed_backend='ddp')

4. It's a good idea to structure your train.py file like this:
3. It's a good idea to structure your train.py file like this:

.. code-block:: python

Expand Down Expand Up @@ -91,6 +88,8 @@ To train a model using multiple-nodes do the following:

sbatch submit.sh

.. note:: using :class:`~torch.utils.data.distributed.DistributedSampler` is already handled by Lightning.

Walltime auto-resubmit
-----------------------------------
When you use Lightning in a SLURM cluster, lightning automatically detects when it is about
Expand Down