update on comments

Lightning-Universe · Apr 13, 2021 · c096c6f · c096c6f
1 parent 846b7d5
commit c096c6f
Show file tree

Hide file tree

Showing 2 changed files with 41 additions and 13 deletions.
diff --git a/docs/source/general/data.rst b/docs/source/general/data.rst
@@ -38,19 +38,28 @@ How to use out-of-the-box flashdatamodules
 Flash provides several DataModules with helpers functions.
 Checkout the :ref:`image_classification` section or any other tasks to learn more about them.
 
-********************************
-Why Preprocess and PostProcess ?
-********************************
+***************
+Data Processing
+***************
 
 Currently, it is common practice to implement a `Dataset <https://pytorch.org/docs/stable/data.html#torch.utils.data.Dataset>`_
 and provide them to a `DataLoader <https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader>`_.
 
 However, after model training, it requires a lot of engineering overhead to make inference on raw data and deploy the model in production environnement.
+Usually, extra processing logic should be added to bridge the gap between training data and raw data.
 
-The :class:`~flash.data.process.Preprocess` and :class:`~flash.data.process.Postprocess` have been created to resolve these issues.
-By providing a series of hooks that can be overridden with custom data processing logic, the user has much more granular control over their data processing flow.
+The :class:`~flash.data.process.Preprocess` and :class:`~flash.data.process.Postprocess` classes can be used to
+store the data as well as the preprocessing and postprocessing transforms.
+
+By providing a series of hooks that can be overridden with custom data processing logic,
+the user has much more granular control over their data processing flow.
+
+Here are the primary advantages:
+
+*  Making inference on raw data simple
+*  Make the code more readable, modular and self-contained
+*  Data Augmentation experimentation is simpler
 
-But it also makes your code more readable, modular and easy to extend.
 
 To change the processing behavior only on specific stages for a given hook,
 you can prefix each of the :class:`~flash.data.process.Preprocess` and  :class:`~flash.data.process.Postprocess`
@@ -70,9 +79,7 @@ Check out :class:`~flash.data.process.Preprocess` for some examples.
 How to customize existing datamodules
 *************************************
 
-Currently, Flash Tasks are implementing using :class:`~flash.data.process.Preprocess` and :class:`~flash.data.process.Postprocess`.
-However, it is not a hard requirement and one can still use :class:`~torch.data.utils.Dataset`, but we highly recommend
-using :class:`~flash.data.process.Preprocess` and :class:`~flash.data.process.Postprocess` instead.
+Flash DataModule can receive directly dataset as follow:
 
 Example::
 
@@ -252,6 +259,13 @@ Example::
                 return self.to_tensor(sample[0]), sample[1]
 
 
+.. note::
+
+    Currently, Flash Tasks are implemented using :class:`~flash.data.process.Preprocess` and :class:`~flash.data.process.Postprocess`.
+    However, it is not a hard requirement and one can still use :class:`~torch.data.utils.Dataset`, but we highly recommend
+    using :class:`~flash.data.process.Preprocess` and :class:`~flash.data.process.Postprocess` instead.
+
+
 *************
 API reference
 *************

diff --git a/flash/data/process.py b/flash/data/process.py
@@ -102,12 +102,16 @@ class PreprocessState:
 
 class Preprocess(Properties, torch.nn.Module):
     """
-    The :class:`~flash.data.process.Preprocess` is used to encapsulate
-    all the processing data loading logic up to the model.
+    The :class:`~flash.data.process.Preprocess` encapsulates
+    all the data processing and loading logic that should run before the data is passed to the model.
 
     It is particularly relevant when you want to provide an end to end implementation which works
     with 4 different stages: ``train``, ``validation``, ``test``,  and inference (``predict``).
 
+    You can override any of the preprocessing hooks to provide custom functionality.
+    All hooks default to no-op (except the collate which is PyTorch default
+    `collate <https://pytorch.org/docs/stable/data.html#dataloader-collate-fn>`_)
+
     The :class:`~flash.data.process.Preprocess` supports the following hooks:
 
         - ``load_data``: Function to receiving some metadata to generate a Mapping from.
@@ -188,7 +192,8 @@ class Preprocess(Properties, torch.nn.Module):
 
     .. note::
 
-        By default, each hook will be no-op execpt the collate which is PyTorch default collate.
+        By default, each hook will be no-op execpt the collate which is PyTorch default
+        `collate <https://pytorch.org/docs/stable/data.html#dataloader-collate-fn>`_.
         To customize them, just override the hooks and ``Flash`` will take care of calling them at the right moment.
 
     .. note::
@@ -314,7 +319,16 @@ def add_callbacks(self, callbacks: List['FlashCallback']):
 
     @classmethod
     def load_data(cls, data: Any, dataset: Optional[Any] = None) -> Mapping:
-        """Loads entire data from Dataset"""
+        """Loads entire data from Dataset. The input ``data`` can be anything, but you need to return a Mapping.
+
+        Example::
+
+            # data: "."
+            # output: [("./cat/1.png", 1), ..., ("./dog/10.png", 0)]
+
+            output: Mapping = load_data(data)
+
+        """
         return data
 
     @classmethod