Skip to content
This repository has been archived by the owner on Oct 9, 2023. It is now read-only.

Adds a template task and docs #306

Merged
merged 60 commits into from
May 19, 2021
Merged
Show file tree
Hide file tree
Changes from 40 commits
Commits
Show all changes
60 commits
Select commit Hold shift + click to select a range
f5e3c49
Initial commit
ethanwharris May 14, 2021
4cccd79
Updates
ethanwharris May 14, 2021
838e40c
Updates
ethanwharris May 14, 2021
49de5a0
Merge branch 'master' into feature/template
ethanwharris May 17, 2021
7735349
Updates
ethanwharris May 17, 2021
ce2108f
Remove template README
ethanwharris May 17, 2021
04b3b96
Fixes
ethanwharris May 17, 2021
f79f909
Updates
ethanwharris May 17, 2021
9834a47
Add examples
ethanwharris May 17, 2021
53a1ba2
Updates
ethanwharris May 17, 2021
2694f46
Updates
ethanwharris May 17, 2021
28b5eec
Updates
ethanwharris May 17, 2021
c552635
Updates
ethanwharris May 17, 2021
65f9bdd
Add tests
ethanwharris May 17, 2021
cc3001a
Updates
ethanwharris May 17, 2021
b4102f0
Merge branch 'master' into feature/template
ethanwharris May 17, 2021
4ae69fa
Fixes
ethanwharris May 17, 2021
3bcf221
A fix
ethanwharris May 17, 2021
eb7c3e4
Fixes
ethanwharris May 17, 2021
839c99a
More tests
ethanwharris May 17, 2021
e2df1ee
Merge branch 'master' into feature/template
ethanwharris May 17, 2021
afe8142
Updates
ethanwharris May 17, 2021
bee8bdd
Fix
ethanwharris May 17, 2021
382c2cb
Merge branch 'master' into feature/template
ethanwharris May 18, 2021
3a24117
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 18, 2021
acd302e
Update docs/source/reference/template.rst
ethanwharris May 19, 2021
907c927
Respond to comments
ethanwharris May 19, 2021
0af0d28
updates
ethanwharris May 19, 2021
9a2be0e
Update docs/source/template/data.rst
ethanwharris May 19, 2021
9740580
Update docs/source/template/data.rst
ethanwharris May 19, 2021
b6d57a2
Update docs/source/template/data.rst
ethanwharris May 19, 2021
084eb6e
Merge branch 'master' into feature/template
ethanwharris May 19, 2021
e390d32
Update docs/source/template/model.rst
ethanwharris May 19, 2021
166dd4d
Updates
ethanwharris May 19, 2021
2f52577
Merge branch 'feature/template' of https://github.com/PyTorchLightnin…
ethanwharris May 19, 2021
0c0780c
Updates
ethanwharris May 19, 2021
3fdba4a
Fixes
ethanwharris May 19, 2021
fa6ba79
Updates
ethanwharris May 19, 2021
7b201b3
Updates
ethanwharris May 19, 2021
96df2c2
Updates
ethanwharris May 19, 2021
9a9cfd4
Fixes
ethanwharris May 19, 2021
fe2cff7
Fixes
ethanwharris May 19, 2021
8167884
Fix
ethanwharris May 19, 2021
ad976f4
Add backbones
ethanwharris May 19, 2021
fecb316
Add backbones
ethanwharris May 19, 2021
b4d952c
Updates
ethanwharris May 19, 2021
c7b7806
Updates
ethanwharris May 19, 2021
23f2f20
Updates
ethanwharris May 19, 2021
ba83757
Fixes
ethanwharris May 19, 2021
1c43ec9
Add links
ethanwharris May 19, 2021
5aa7cf5
Fixes
ethanwharris May 19, 2021
4d35762
Simplify
ethanwharris May 19, 2021
36a6538
Update CHANGELOG.md
ethanwharris May 19, 2021
17085fb
Merge branch 'master' into feature/template
mergify[bot] May 19, 2021
4550e04
Update docs/source/template/optional.rst
ethanwharris May 19, 2021
5ebc71e
Update docs/source/template/optional.rst
ethanwharris May 19, 2021
71de79c
Update docs/source/template/task.rst
ethanwharris May 19, 2021
0850333
Updates
ethanwharris May 19, 2021
4fd0344
Updates
ethanwharris May 19, 2021
c21e816
Updates
ethanwharris May 19, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,7 @@ def _load_py_module(fname, pkg="flash"):
"torch": ("https://pytorch.org/docs/stable/", None),
"numpy": ("https://docs.scipy.org/doc/numpy/", None),
"PIL": ("https://pillow.readthedocs.io/en/stable/", None),
"pytorchvideo": ("https://pytorchvideo.readthedocs.io/en/latest/", None),
"pytorch_lightning": ("https://pytorch-lightning.readthedocs.io/en/stable/", None),
}

Expand Down
1 change: 1 addition & 0 deletions docs/source/custom_task.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ Tutorial: Creating a Custom Task
In this tutorial we will go over the process of creating a custom :class:`~flash.core.model.Task`,
along with a custom :class:`~flash.core.data.data_module.DataModule`.

.. note:: This tutorial is only intended to help you create a small custom task for a personal project. If you want a more detailed guide, have a look at our :ref:`guide on contributing a task to flash. <contributing>`

The tutorial objective is to create a ``RegressionTask`` to learn to predict if someone has ``diabetes`` or not.
We will use ``scikit-learn`` `Diabetes dataset <https://scikit-learn.org/stable/datasets/toy_dataset.html#diabetes-dataset>`__.
Expand Down
18 changes: 18 additions & 0 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,24 @@ Lightning Flash
general/finetuning
general/predictions


.. toctree::
:maxdepth: 1
:caption: Contributing a Task

template/intro
template/data
template/task
template/optional
template/examples
template/tests
template/docs

.. toctree::
:hidden:

reference/template

Indices and tables
==================

Expand Down
75 changes: 75 additions & 0 deletions docs/source/reference/template.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@

.. _template:

########
Template
########

********
The task
********

Here you should add a description of your task. For example:
Classification is the task of assigning one of a number of classes to each data point.
The :class:`~flash.template.TemplateSKLearnClassifier` is a :class:`~flash.core.model.Task` for classifying the datasets included with scikit-learn.

------

*********
Inference
*********

Here, you should add a short intro to your predict example, and then use ``literalinclude`` to add it.

.. note:: We skip the first 14 lines as they are just the copyright notice.

Our predict example uses a model pre-trained on the iris data.

.. literalinclude:: ../../../flash_examples/predict/template.py
:language: python
:lines: 14-

For more advanced inference options, see :ref:`predictions`.

------

********
Training
********

In this section, we briefly describe the data, and then ``literalinclude`` our finetuning example.

Now we'll train on Fisher's classic iris data.
It contains 150 records with four features (sepal length, sepal width, petal length, and petal width) in three classes (species of Iris: setosa, virginica and versicolor).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Include link to images to make your description better.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just tabular data, so I'm not sure what images we would show here


Now all we need is to train our task!

.. literalinclude:: ../../../flash_examples/finetuning/template.py
:language: python
:lines: 14-

------

*************
API reference
*************

We usually include the API reference for the :class:`~flash.core.model.Task` and :class:`~flash.core.data.data_module.DataModule`.
You can optionally add the other classes you've implemented.
To add the API reference, use the ``autoclass`` directive.

.. _template_classifier:

TemplateSKLearnClassifier
-------------------------

.. autoclass:: flash.template.TemplateSKLearnClassifier
:members:
:exclude-members: forward

.. _template_data:

TemplateData
------------

.. autoclass:: flash.template.TemplateData
217 changes: 217 additions & 0 deletions docs/source/template/data.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,217 @@
.. _contributing_data:

********
The Data
********

The first step to contributing a task is to implement the classes we need to load some data.
Inside ``data.py`` you should implement:
ethanwharris marked this conversation as resolved.
Show resolved Hide resolved

#. some :class:`~flash.core.data.data_source.DataSource` classes *(optional)*
#. a :class:`~flash.core.data.process.Preprocess`
#. a :class:`~flash.core.data.data_module.DataModule`
#. a :class:`~flash.core.data.callbacks.BaseVisualization` *(optional)*
ethanwharris marked this conversation as resolved.
Show resolved Hide resolved
#. a :class:`~flash.core.data.process.Postprocess` *(optional)*

DataSource
^^^^^^^^^^

The :class:`~flash.core.data.data_source.DataSource` class contains the logic for data loading from different sources such as folders, files, tensors, etc.
If you just want to support :meth:`~flash.core.data.data_module.DataModule.from_datasets` you won't need a :class:`~flash.core.data.data_source.DataSource`, but if you want to support a few different ways of loading data for your task, the more the merrier!
Each :class:`~flash.core.data.data_source.DataSource` has a 2 methods:

- :meth:`~flash.core.data.data_source.DataSource.load_data` takes some dataset metadata (e.g. a folder name) as input and produces a sequence or iterable of samples or sample metadata.
- :meth:`~flash.core.data.data_source.DataSource.load_sample` then takes as input a single element from the output of ``load_data`` and returns a sample.

By default these methods just return their input, so you don't need both a :meth:`~flash.core.data.data_source.DataSource.load_data` and a :meth:`~flash.core.data.data_source.DataSource.load_sample` to create a :class:`~flash.core.data.data_source.DataSource`.
Where possible, you should override one of our existing :class:`~flash.core.data.data_source.DataSource` classes.

Let's start by implementing a ``TemplateNumpyDataSource``, which overrides :class:`~flash.core.data.data_source.NumpyDataSource`.
The main :class:`~flash.core.data.data_source.DataSource` method that we have to implement is :meth:`~flash.core.data.data_source.DataSource.load_data`.
As we're extending the ``NumpyDataSource``, we expect the same ``data`` argument (in this case, a tuple containing data and corresponding target arrays).

We can also take the dataset argument.
Any attributes we set on ``dataset`` will be available on the :class:`~torch.utils.data.Dataset` generated by our :class:`~flash.core.data.data_source.DataSource`.
In this data source, we'll set the ``num_features`` attribute.

Here's the code for our ``TemplateNumpyDataSource.load_data`` method:

.. literalinclude:: ../../../flash/template/classification/data.py
:language: python
:dedent: 4
:pyobject: TemplateNumpyDataSource.load_data

.. note:: Later, when we add :ref:`our DataModule implementation <contributing_data_module>`, we'll make ``num_features`` available to the user.

|
ethanwharris marked this conversation as resolved.
Show resolved Hide resolved

Sometimes you need to something a bit more custom.
When creating a custom :class:`~flash.core.data.data_source.DataSource`, the type of the ``data`` argument is up to you.
For our template :class:`~flash.core.data.model.Task`, it would be cool if the user could provide a scikit-learn ``Bunch`` as the data source.
To achieve this, we'll add a ``TemplateSKLearnDataSource`` whose ``load_data`` expects a ``Bunch`` as input.
We override our ``TemplateNumpyDataSource`` so that we can call ``super`` with the data and targets extracted from the ``Bunch``.
We perform two additional steps here to improve the user experience:

1. We set the ``num_classes`` attribute on the ``dataset``. If ``num_classes`` is set, it is automatically made available as a property of the :class:`~flash.core.data.data_module.DataModule`.
2. We create and set a :class:`~flash.core.data.data_source.LabelsState`. The labels provided here will be shared with the :class:`~flash.core.classification.Labels` serializer, so the user doesn't need to provide them.

Here's the code for the ``TemplateSKLearnDataSource.load_data`` method:

.. literalinclude:: ../../../flash/template/classification/data.py
:language: python
:dedent: 4
:pyobject: TemplateSKLearnDataSource.load_data

We can customize the behaviour of our :meth:`~flash.core.data.data_source.DataSource.load_data` for different stages, by prepending `train`, `val`, `test`, or `predict`.
For our ``TemplateSKLearnDataSource``, we don't want to provide any targets to the model when predicting.
We can implement ``predict_load_data`` like this:

.. literalinclude:: ../../../flash/template/classification/data.py
:language: python
:dedent: 4
:pyobject: TemplateSKLearnDataSource.predict_load_data

DataSource vs Dataset
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me rephrase it to see if I understand it correctly:
A DataSource has a similar function as Dataset except that it includes preprocessing methods, generates a Dataset when we call load_data, and will generate (possibly different) Datasets for training, validation etc.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may also be useful to understand how it is different from torch.utils.DataLoader, since Dataset only requires getitem, but Dataloader also does some preprocessing, although I think does not distinguish between training, validation ...
Also similar to https://docs.fast.ai/data.load.html no?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The high-level view is this:

  • DataSource is used to generate multiple datasets (e.g. train, test, val, predict)
  • The preprocessing methods are stored in Preprocess
  • When the dataloader is created, the preprocess transforms are injected into the workers and the model so that they are all called in the right place

So DataSource, Preprocess, DataPipeline is really just a different way of creating a DataSet and DataLoader (not a replacement). Can't speak to similarity with Fast AI as I'm not very familiar with it. Hope that helps!

~~~~~~~~~~~~~~~~~~~~~

A :class:`~flash.core.data.data_source.DataSource` is not the same as a :class:`torch.utils.data.Dataset`.
ethanwharris marked this conversation as resolved.
Show resolved Hide resolved
When a ``from_*`` method is called on your :class:`~flash.core.data.data_module.DataModule`, it gets the :class:`~flash.core.data.data_source.DataSource` to use from the :class:`~flash.core.data.process.Preprocess`.
A :class:`~torch.utils.data.Dataset` is then created from the :class:`~flash.core.data.data_source.DataSource` for each stage (`train`, `val`, `test`, `predict`) using the provided metadata (e.g. folder name, numpy array etc.).

The output of the :meth:`~flash.core.data.data_source.DataSource.load_data` can just be a :class:`torch.utils.data.Dataset` instance.
If the library that your :class:`~flash.core.data.model.Task` is based on provides a custom dataset, you don't need to re-write it as a :class:`~flash.core.data.data_source.DataSource`.
For example, the :meth:`~flash.core.data.data_source.DataSource.load_data` of the ``VideoClassificationPathsDataSource`` just creates an :class:`~pytorchvideo.data.EncodedVideoDataset` from the given folder.
Here's how it looks (from ``video/classification.data.py``):

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we could give a simpler example for something like
https://archive.ics.uci.edu/ml/datasets/iris
I find the above example to have more code than needed

.. literalinclude:: ../../../flash/video/classification/data.py
:language: python
:dedent: 4
:pyobject: VideoClassificationPathsDataSource.load_data

Preprocess
ethanwharris marked this conversation as resolved.
Show resolved Hide resolved
^^^^^^^^^^

The :class:`~flash.core.data.process.Preprocess` object contains all data transforms.
Internally we inject the :class:`~flash.core.data.process.Preprocess` transforms into the right places so that we can address the batch at several points along the pipeline.
ethanwharris marked this conversation as resolved.
Show resolved Hide resolved

Defining the standard transforms (typically at least a ``to_tensor_transform`` should be defined) for your :class:`~flash.core.data.process.Preprocess` is as simple as implementing the ``default_transforms`` method.
The :class:`~flash.core.data.process.Preprocess` must take ``train_transform``, ``val_transform``, ``test_transform``, and ``predict_transform`` arguments in the ``__init__``.
These arguments can be provided by the user (when creating the :class:`~flash.core.data.data_module.DataModule`) to override the default transforms.
Any additional arguments are up to you.

Inside the ``__init__``, we make a call to super.
This is where we register our data sources.
Data sources should be given as a dictionary which maps data source name to data source object.
The name can be anything, but if you want to take advantage of our built-in ``from_*`` classmethods, you should use :class:`~flash.core.data.data_source.DefaultDataSources` as the names.
In our case, we have both a :attr:`~flash.core.data.data_source.DefaultDataSources.NUMPY` and a custom scikit-learn data source (which we'll call `"sklearn"`).

You should also provide a ``default_data_source``.
This is the name of the data source to use by default when predicting.
It'd be cool if we could get predictions just from a numpy array, so we'll use :attr:`~flash.core.data.data_source.DefaultDataSources.NUMPY` as the default.

Here's our ``TemplatePreprocess.__init__``:

.. literalinclude:: ../../../flash/template/classification/data.py
:language: python
:dedent: 4
:pyobject: TemplatePreprocess.__init__

For our ``TemplatePreprocess``, we'll just configure a default ``to_tensor_transform``.
Let's first define the transform as a ``staticmethod``:

.. literalinclude:: ../../../flash/template/classification/data.py
:language: python
:dedent: 4
:pyobject: TemplatePreprocess.input_to_tensor

Our inputs samples will be dictionaries whose keys are in the :class:`~flash.core.data.data_source.DefaultDataKeys`.
You can map each key to different transforms using :class:`~flash.core.data.transforms.ApplyToKeys`.
Here's our ``default_transforms`` method:

.. literalinclude:: ../../../flash/template/classification/data.py
:language: python
:dedent: 4
:pyobject: TemplatePreprocess.default_transforms

.. _contributing_data_module:

DataModule
^^^^^^^^^^

The :class:`~flash.core.data.data_module.DataModule` is responsible for creating the :class:`~torch.utils.data.DataLoader` and injecting the transforms for each stage.
When the user calls a ``from_*`` method (such as :meth:`~flash.core.data.data_module.DataModule.from_numpy`), the following steps take place:

#. The :meth:`~flash.core.data.data_module.DataModule.from_data_source` method is called with the name of the :class:`~flash.core.data.data_source.DataSource` to use and the inputs to provide to :meth:`~flash.core.data.data_source.DataSource.load_data` for each stage.
#. The :class:`~flash.core.data.process.Preprocess` is created from ``cls.preprocess_cls`` (if it wasn't provided by the user) with any provided transforms.
#. The :class:`~flash.core.data.data_source.DataSource` of the provided name is retrieved from the :class:`~flash.core.data.process.Preprocess`.
#. A :class:`~flash.core.data.auto_dataset.BaseAutoDataset` is created from the :class:`~flash.core.data.data_source.DataSource` for each stage.
#. The :class:`~flash.core.data.data_module.DataModule` is instantiated with the data sets.

To create our ``TemplateData`` :class:`~flash.core.data.data_module.DataModule`, we first need to attach out preprocess class like this:

.. code-block:: python

preprocess_cls = TemplatePreprocess

Since we provided a :attr:`~flash.core.data.data_source.DefaultDataSources.NUMPY` :class:`~flash.core.data.data_source.DataSource` in the ``TemplatePreprocess``, :meth:`~flash.core.data.data_module.DataModule.from_numpy` will now work with our ``TemplateData``.

If you've defined a fully custom :class:`~flash.core.data.data_source.DataSource` (like our ``TemplateSKLearnDataSource``), then you will need to write a ``from_*`` method for each.
Here's the ``from_sklearn`` method for our ``TemplateData``:

.. literalinclude:: ../../../flash/template/classification/data.py
:language: python
:dedent: 4
:pyobject: TemplateData.from_sklearn

The final step is to implement the ``num_features`` property for our ``TemplateData``.
This is just a convenience for the user that finds the ``num_features`` attribute on any of the data sets and returns it.
Here's the code:

.. literalinclude:: ../../../flash/template/classification/data.py
:language: python
:dedent: 4
:pyobject: TemplateData.num_features

BaseVisualization
^^^^^^^^^^^^^^^^^

An optional step is to implement a ``BaseVisualization``. The ``BaseVisualization`` lets you control how data at various points in the pipeline can be visualized.
This is extremely useful for debugging purposes, allowing users to view their data and understand the impact of their transforms.

Take a look at our ``TemplateVisualization`` to get started:

.. note::
Don't worry about implementing it right away, you can always come back and add it later!

.. autoclass:: flash.template.classification.data.TemplateVisualization
:members:

.. raw:: html

<details>
<summary>Source</summary>

.. literalinclude:: ../../../flash/template/classification/data.py
:language: python
:pyobject: TemplateVisualization

.. raw:: html

</details>

Postprocess
^^^^^^^^^^^

Sometimes you have some transforms that need to be applied _after_ your model.
ethanwharris marked this conversation as resolved.
Show resolved Hide resolved
For this you can optionally implement a :class:`~flash.core.data.process.Postprocess`.
ethanwharris marked this conversation as resolved.
Show resolved Hide resolved
The :class:`~flash.core.data.process.Postprocess` is applied to the model outputs during inference.
You may want to use it for: converting tokens back into text, applying an inverse normalization to an output image, resizing a generated image back to the size of the input, etc.
As an example, here's the :class:`~text.classification.data.TextClassificationPostProcess` which gets the logits from a ``SequenceClassifierOutput``:

.. literalinclude:: ../../../flash/text/classification/data.py
:language: python
:pyobject: TextClassificationPostProcess

ethanwharris marked this conversation as resolved.
Show resolved Hide resolved
------

Now that you've got some data, it's time to :ref:`implement your task! <contributing_task>`
29 changes: 29 additions & 0 deletions docs/source/template/docs.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
.. _contributing_docs:

*********
The Docs
*********

The final step is to add some docs.
For each :class:`~flash.core.model.Task` in Flash, we have a docs page in ``docs/source/reference``.
You should create a ``.rst`` file there with the following:

- a brief description of the task
- the predict example
- the finetuning example
- any relevant API reference

Here are the contents of ``docs/source/reference/template.rst`` which breaks down each of these steps:

.. literalinclude:: ../reference/template.rst
:language: rest

:ref:`Here's the rendered doc page! <template>`

------

Once the docs are done, it's finally time to open a PR and wait for some reviews!

|

Congratulations on adding your first :class:`~flash.core.model.Task` to Flash, we hope to see you again soon!
Loading