Skip to content
This repository has been archived by the owner on Oct 9, 2023. It is now read-only.

Commit

Permalink
update on comments
Browse files Browse the repository at this point in the history
  • Loading branch information
tchaton committed Apr 13, 2021
1 parent 846b7d5 commit c096c6f
Show file tree
Hide file tree
Showing 2 changed files with 41 additions and 13 deletions.
32 changes: 23 additions & 9 deletions docs/source/general/data.rst
Original file line number Diff line number Diff line change
Expand Up @@ -38,19 +38,28 @@ How to use out-of-the-box flashdatamodules
Flash provides several DataModules with helpers functions.
Checkout the :ref:`image_classification` section or any other tasks to learn more about them.

********************************
Why Preprocess and PostProcess ?
********************************
***************
Data Processing
***************

Currently, it is common practice to implement a `Dataset <https://pytorch.org/docs/stable/data.html#torch.utils.data.Dataset>`_
and provide them to a `DataLoader <https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader>`_.

However, after model training, it requires a lot of engineering overhead to make inference on raw data and deploy the model in production environnement.
Usually, extra processing logic should be added to bridge the gap between training data and raw data.

The :class:`~flash.data.process.Preprocess` and :class:`~flash.data.process.Postprocess` have been created to resolve these issues.
By providing a series of hooks that can be overridden with custom data processing logic, the user has much more granular control over their data processing flow.
The :class:`~flash.data.process.Preprocess` and :class:`~flash.data.process.Postprocess` classes can be used to
store the data as well as the preprocessing and postprocessing transforms.

By providing a series of hooks that can be overridden with custom data processing logic,
the user has much more granular control over their data processing flow.

Here are the primary advantages:

* Making inference on raw data simple
* Make the code more readable, modular and self-contained
* Data Augmentation experimentation is simpler

But it also makes your code more readable, modular and easy to extend.

To change the processing behavior only on specific stages for a given hook,
you can prefix each of the :class:`~flash.data.process.Preprocess` and :class:`~flash.data.process.Postprocess`
Expand All @@ -70,9 +79,7 @@ Check out :class:`~flash.data.process.Preprocess` for some examples.
How to customize existing datamodules
*************************************

Currently, Flash Tasks are implementing using :class:`~flash.data.process.Preprocess` and :class:`~flash.data.process.Postprocess`.
However, it is not a hard requirement and one can still use :class:`~torch.data.utils.Dataset`, but we highly recommend
using :class:`~flash.data.process.Preprocess` and :class:`~flash.data.process.Postprocess` instead.
Flash DataModule can receive directly dataset as follow:

Example::

Expand Down Expand Up @@ -252,6 +259,13 @@ Example::
return self.to_tensor(sample[0]), sample[1]


.. note::

Currently, Flash Tasks are implemented using :class:`~flash.data.process.Preprocess` and :class:`~flash.data.process.Postprocess`.
However, it is not a hard requirement and one can still use :class:`~torch.data.utils.Dataset`, but we highly recommend
using :class:`~flash.data.process.Preprocess` and :class:`~flash.data.process.Postprocess` instead.


*************
API reference
*************
Expand Down
22 changes: 18 additions & 4 deletions flash/data/process.py
Original file line number Diff line number Diff line change
Expand Up @@ -102,12 +102,16 @@ class PreprocessState:

class Preprocess(Properties, torch.nn.Module):
"""
The :class:`~flash.data.process.Preprocess` is used to encapsulate
all the processing data loading logic up to the model.
The :class:`~flash.data.process.Preprocess` encapsulates
all the data processing and loading logic that should run before the data is passed to the model.
It is particularly relevant when you want to provide an end to end implementation which works
with 4 different stages: ``train``, ``validation``, ``test``, and inference (``predict``).
You can override any of the preprocessing hooks to provide custom functionality.
All hooks default to no-op (except the collate which is PyTorch default
`collate <https://pytorch.org/docs/stable/data.html#dataloader-collate-fn>`_)
The :class:`~flash.data.process.Preprocess` supports the following hooks:
- ``load_data``: Function to receiving some metadata to generate a Mapping from.
Expand Down Expand Up @@ -188,7 +192,8 @@ class Preprocess(Properties, torch.nn.Module):
.. note::
By default, each hook will be no-op execpt the collate which is PyTorch default collate.
By default, each hook will be no-op execpt the collate which is PyTorch default
`collate <https://pytorch.org/docs/stable/data.html#dataloader-collate-fn>`_.
To customize them, just override the hooks and ``Flash`` will take care of calling them at the right moment.
.. note::
Expand Down Expand Up @@ -314,7 +319,16 @@ def add_callbacks(self, callbacks: List['FlashCallback']):

@classmethod
def load_data(cls, data: Any, dataset: Optional[Any] = None) -> Mapping:
"""Loads entire data from Dataset"""
"""Loads entire data from Dataset. The input ``data`` can be anything, but you need to return a Mapping.
Example::
# data: "."
# output: [("./cat/1.png", 1), ..., ("./dog/10.png", 0)]
output: Mapping = load_data(data)
"""
return data

@classmethod
Expand Down

0 comments on commit c096c6f

Please sign in to comment.