Lightning-Universe · tchaton · Apr 13, 2021 · Apr 8, 2021 · Apr 9, 2021 · Apr 9, 2021
@@ -4,26 +4,39 @@ Tutorial: Creating a Custom Task
 In this tutorial we will go over the process of creating a custom task,
 along with a custom data module.
 
+1 . Imports
+-----------
+
+
 .. testcode:: python
 
-    import flash
+    from typing import Any, List, Tuple
 
+    import numpy as np
     import torch
-    from torch.utils.data import TensorDataset, DataLoader
-    from torch import nn
+    from pytorch_lightning import seed_everything
     from sklearn import datasets
     from sklearn.model_selection import train_test_split
+    from torch import nn
+
+    import flash
+    from flash.data.auto_dataset import AutoDataset
+    from flash.data.process import Postprocess, Preprocess
+
+    seed_everything(42)
 
-The Task: Linear regression
----------------------------
+
+2 . The Task: Linear regression
+-------------------------------
 
 Here we create a basic linear regression task by subclassing
-``flash.Task``. For the majority of tasks, you will likely only need to
+:class:`~flash.core.model.Task`. For the majority of tasks, you will likely only need to
 override the ``__init__`` and ``forward`` methods.
 
 .. testcode::
 
     class LinearRegression(flash.Task):
+
         def __init__(self, num_inputs, learning_rate=0.001, metrics=None):
             # what kind of model do we want?
             model = nn.Linear(num_inputs, 1)
@@ -58,35 +71,124 @@ testing) or override ``training_step``, ``validation_step``, and
 Lightning’s
 `methods <https://pytorch-lightning.readthedocs.io/en/latest/lightning_module.html#methods>`__.
 
-The Data
---------
+3 . The Data
+-------------
+
+For a task you will likely need a specific way of loading data.
+
+It is recommended to create a :class:`~flash.data.process.Preprocess` object.
+The :class:`~flash.data.process.Preprocess` contains all the processing logic and are similar to ``Callback``.
+The user has to override hooks with their processing logic.
+
+.. note::
+    As new concepts are being introduced, we strongly encourage the reader to click on :class:`~flash.data.process.Preprocess`
+    before going further in the tutorial.
+
+The user would have to implement a :class:`~flash.data.data_module.DataModule` as a way to perform data checks and instantiate the preprocess.
+
+.. note::
+
+   Philosophically, the :class:`~flash.data.process.Preprocess` belongs with the :class:`~flash.data.data_module.DataModule`
+   and the :class:`~flash.data.process.Postprocess` with the :class:`~flash.core.model.Task`.
+
+
+3.a The DataModule API
+----------------------
+
+First, let's design the user-facing API. The ``NumpyDataModule`` will provide a ``from_xy_dataset`` helper ``classmethod``.
+
+Example::
+
+    x, y = ...
+    preprocess_cls = ...
+    datamodule = NumpyDataModule.from_xy_dataset(x, y, preprocess_cls)
+
+Here are the `NumpyDataModule`` implementation:
+
+Example::
+
+    from flash import DataModule
+    from flash.data.process import Preprocess
+
+    class NumpyDataModule(DataModule):
+
+        @classmethod
+        def from_xy_dataset(cls, x: ND, y: ND, preprocess_cls: Preprocess = NumpyPreprocess, batch_size: int = 64, num_workers: int = 0):
+
+            preprocess = preprocess_cls()
+
+            x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=.20, random_state=0)
+
+            dm = cls.from_load_data_inputs(
+                train_load_data_input=(x_train, y_train),
+                test_load_data_input=(x_test, y_test),
+                preprocess=preprocess,  # DON'T FORGET TO PROVIDE THE PREPROCESS
+                batch_size=batch_size,
+                num_workers=num_workers
+            )
+            # Some metatada can be accessed from ``train_ds`` directly.
+            dm.num_inputs = dm.train_dataset.num_inputs
+            return dm
 
-For a task you will likely need a specific way of loading data. For this
-example, lets say we want a ``flash.DataModule`` to be used explicitly
-for the prediction of diabetes disease progression. We can create this
-``DataModule`` below, wrapping the scikit-learn `Diabetes
-dataset <https://scikit-learn.org/stable/datasets/toy_dataset.html#diabetes-dataset>`__.
 
+.. note::
 
-You’ll notice we added a ``DataPipeline``, which will be used when we
-call ``.predict()`` on our model. In this case we want to nicely format
-our ouput from the model with the string ``"disease progression"``, but
-you could do any sort of post processing you want (see :ref:`datapipeline`).
+    The :class:`~flash.data.data_module.DataModule` provides a ``from_load_data_inputs`` helper function. This function will take care
+    of connecting the provided :class:`~flash.data.process.Preprocess` with the :class:`~flash.data.data_module.DataModule`.
+    Make sure to instantiate your :class:`~flash.data.data_module.DataModule` with this helper if you rely on :class:`~flash.data.process.Preprocess`
+    objects.
 
-Fit
----
+3.b The Preprocess API
+----------------------
+
+Example::
+
+    import torch
+    from torch import Tensor
+    import numpy as np
+
+    ND = np.ndarray
+
+    class NumpyPreprocess(Preprocess):
+
+        def load_data(self, data: Tuple[ND, ND], dataset: AutoDataset) -> List[Tuple[ND, float]]:
+            if self.training:
+                dataset.num_inputs = data[0].shape[1]
+            return [(x, y) for x, y in zip(*data)]
+
+        def to_tensor_transform(self, sample: Any) -> Tuple[Tensor, Tensor]:
+            x, y = sample
+            x = torch.from_numpy(x).float()
+            y = torch.tensor(y, dtype=torch.float)
+            return x, y
+
+        def predict_load_data(self, data: ND) -> ND:
+            return data
+
+        def predict_to_tensor_transform(self, sample: ND) -> ND:
+            return torch.from_numpy(sample).float()
+
+4. Fitting
+----------
+
+For this task, we will be using ``scikit-learn`` `Diabetes
+dataset <https://scikit-learn.org/stable/datasets/toy_dataset.html#diabetes-dataset>`__.
 
 Like any Flash Task, we can fit our model using the ``flash.Trainer`` by
 supplying the task itself, and the associated data:
 
 .. code:: python
 
-    data = DiabetesData()
-    model = LinearRegression(num_inputs=data.num_inputs)
+    x, y = datasets.load_diabetes(return_X_y=True)
+    datamodule = NumpyDataModule.from_xy_dataset(x, y)
+    model = LinearRegression(num_inputs=datamodule.num_inputs)
 
     trainer = flash.Trainer(max_epochs=1000)
     trainer.fit(model, data)
 
+5. Predicting
+-------------
+
 With a trained model we can now perform inference. Here we will use a
 few examples from the test set of our data:
 
@@ -99,15 +201,48 @@ few examples from the test set of our data:
         [-0.0128, -0.0446, -0.0235, -0.0401, -0.0167,  0.0046, -0.0176, -0.0026, -0.0385, -0.0384],
         [-0.0237, -0.0446,  0.0455,  0.0907, -0.0181, -0.0354,  0.0707, -0.0395, -0.0345, -0.0094]])
 
-    model.predict(predict_data)
+    predictions = model.predict(predict_data)
+    print(predictions)
+    #out: [tensor([14.7190]), tensor([14.7100]), tensor([14.7288]), tensor([14.6685]), tensor([14.6687])]
+
+
+6. Customize PostProcess
+------------------------
+
+To customize the postprocessing of this task, you can create a :class:`~flash.data.process.Postprocess` objects and assign it to your model as follows:
+
+.. code:: python
+
+    class CustomPostprocess(Postprocess):
+
+        THRESHOLD = 14.72
+
+        def predict_per_sample_transform(self, pred: Any) -> Any:
+            if pred > self.THRESHOLD:
+
+                def send_slack_message(pred):
+                    print(f"This prediction: {pred} is above the threshold: {self.THRESHOLD}")
+
+                send_slack_message(pred)
+            return pred
+
+
+    class LinearRegression(flash.Task):
+
+        # ``postprocess_cls`` is a special attribute name used internally
+        # to instantiate your Postprocess.
+        postprocess_cls = CustomPostprocess
+
+        ...
+
+And when running predict one more time.
+
+.. code:: python
 
-Because of our custom data pipeline’s ``after_uncollate`` method, we
-will get a nicely formatted output like the following:
+    predict_data = ...
 
-.. code::
+    predictions = model.predict(predict_data)
+    # out: This prediction: tensor([14.7288]) is above the threshold: 14.72
 
-   ['disease progression: 155.90',
-    'disease progression: 156.59',
-    'disease progression: 152.69',
-    'disease progression: 149.05',
-    'disease progression: 150.90']
+    print(predictions)
+    # out: [tensor([14.7190]), tensor([14.7100]), tensor([14.7288]), tensor([14.6685]), tensor([14.6687])]
@@ -0,0 +1,46 @@
+########
+Callback
+########
+
+.. _callback:
+
+**************
+Flash Callback
+**************
+
+:class:`~flash.data.callback.FlashCallback` are extensions of the PyTorch Lightning `Callback <https://pytorch-lightning.readthedocs.io/en/latest/extensions/callbacks.html>`__.
+
+A callback is a self-contained program that can be reused across projects.
+
+Flash and Lightning have a callback system to execute callbacks when needed.
+
+Callbacks should capture NON-ESSENTIAL logic that is NOT required for your lightning module to run.
+
+*******************
+Available Callbacks
+*******************
+
+
+BaseDataFetcher
+_______________
+
+.. autoclass:: flash.data.callback.BaseDataFetcher
+   :members: enable
+
+BaseViz
+_______
+
+.. autoclass:: flash.data.base_viz.BaseViz
+   :members:
+
+
+*************
+API reference
+*************
+
+
+FlashCallback
+_____________
+
+.. autoclass:: flash.data.callback.FlashCallback
+    :members: