diff --git a/docs/source/general/data.rst b/docs/source/general/data.rst index da21d3c09a..38ca3a580b 100644 --- a/docs/source/general/data.rst +++ b/docs/source/general/data.rst @@ -38,19 +38,28 @@ How to use out-of-the-box flashdatamodules Flash provides several DataModules with helpers functions. Checkout the :ref:`image_classification` section or any other tasks to learn more about them. -******************************** -Why Preprocess and PostProcess ? -******************************** +*************** +Data Processing +*************** Currently, it is common practice to implement a `Dataset `_ and provide them to a `DataLoader `_. However, after model training, it requires a lot of engineering overhead to make inference on raw data and deploy the model in production environnement. +Usually, extra processing logic should be added to bridge the gap between training data and raw data. -The :class:`~flash.data.process.Preprocess` and :class:`~flash.data.process.Postprocess` have been created to resolve these issues. -By providing a series of hooks that can be overridden with custom data processing logic, the user has much more granular control over their data processing flow. +The :class:`~flash.data.process.Preprocess` and :class:`~flash.data.process.Postprocess` classes can be used to +store the data as well as the preprocessing and postprocessing transforms. + +By providing a series of hooks that can be overridden with custom data processing logic, +the user has much more granular control over their data processing flow. + +Here are the primary advantages: + +* Making inference on raw data simple +* Make the code more readable, modular and self-contained +* Data Augmentation experimentation is simpler -But it also makes your code more readable, modular and easy to extend. To change the processing behavior only on specific stages for a given hook, you can prefix each of the :class:`~flash.data.process.Preprocess` and :class:`~flash.data.process.Postprocess` @@ -70,9 +79,7 @@ Check out :class:`~flash.data.process.Preprocess` for some examples. How to customize existing datamodules ************************************* -Currently, Flash Tasks are implementing using :class:`~flash.data.process.Preprocess` and :class:`~flash.data.process.Postprocess`. -However, it is not a hard requirement and one can still use :class:`~torch.data.utils.Dataset`, but we highly recommend -using :class:`~flash.data.process.Preprocess` and :class:`~flash.data.process.Postprocess` instead. +Flash DataModule can receive directly dataset as follow: Example:: @@ -252,6 +259,13 @@ Example:: return self.to_tensor(sample[0]), sample[1] +.. note:: + + Currently, Flash Tasks are implemented using :class:`~flash.data.process.Preprocess` and :class:`~flash.data.process.Postprocess`. + However, it is not a hard requirement and one can still use :class:`~torch.data.utils.Dataset`, but we highly recommend + using :class:`~flash.data.process.Preprocess` and :class:`~flash.data.process.Postprocess` instead. + + ************* API reference ************* diff --git a/flash/data/process.py b/flash/data/process.py index f1e28e5b6d..670f906ed0 100644 --- a/flash/data/process.py +++ b/flash/data/process.py @@ -102,12 +102,16 @@ class PreprocessState: class Preprocess(Properties, torch.nn.Module): """ - The :class:`~flash.data.process.Preprocess` is used to encapsulate - all the processing data loading logic up to the model. + The :class:`~flash.data.process.Preprocess` encapsulates + all the data processing and loading logic that should run before the data is passed to the model. It is particularly relevant when you want to provide an end to end implementation which works with 4 different stages: ``train``, ``validation``, ``test``, and inference (``predict``). + You can override any of the preprocessing hooks to provide custom functionality. + All hooks default to no-op (except the collate which is PyTorch default + `collate `_) + The :class:`~flash.data.process.Preprocess` supports the following hooks: - ``load_data``: Function to receiving some metadata to generate a Mapping from. @@ -188,7 +192,8 @@ class Preprocess(Properties, torch.nn.Module): .. note:: - By default, each hook will be no-op execpt the collate which is PyTorch default collate. + By default, each hook will be no-op execpt the collate which is PyTorch default + `collate `_. To customize them, just override the hooks and ``Flash`` will take care of calling them at the right moment. .. note:: @@ -314,7 +319,16 @@ def add_callbacks(self, callbacks: List['FlashCallback']): @classmethod def load_data(cls, data: Any, dataset: Optional[Any] = None) -> Mapping: - """Loads entire data from Dataset""" + """Loads entire data from Dataset. The input ``data`` can be anything, but you need to return a Mapping. + + Example:: + + # data: "." + # output: [("./cat/1.png", 1), ..., ("./dog/10.png", 0)] + + output: Mapping = load_data(data) + + """ return data @classmethod