-
Notifications
You must be signed in to change notification settings - Fork 212
Conversation
Hello @ethanwharris! Thanks for updating this PR.
Comment last updated at 2021-05-19 18:39:05 UTC |
Codecov Report
@@ Coverage Diff @@
## master #306 +/- ##
==========================================
+ Coverage 86.91% 87.02% +0.10%
==========================================
Files 78 83 +5
Lines 4021 4130 +109
==========================================
+ Hits 3495 3594 +99
- Misses 526 536 +10
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
In this section, we briefly describe the data, and then ``literalinclude`` our finetuning example. | ||
|
||
Now we'll train on Fisher's classic iris data. | ||
It contains 150 records with four features (sepal length, sepal width, petal length, and petal width) in three classes (species of Iris: setosa, virginica and versicolor). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Include link to images to make your description better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just tabular data, so I'm not sure what images we would show here
@edenlightning thanks for the review! - summary of main changes made:
|
:dedent: 4 | ||
:pyobject: TemplateSKLearnDataSource.predict_load_data | ||
|
||
DataSource vs Dataset |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me rephrase it to see if I understand it correctly:
A DataSource has a similar function as Dataset except that it includes preprocessing methods, generates a Dataset when we call load_data, and will generate (possibly different) Datasets for training, validation etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may also be useful to understand how it is different from torch.utils.DataLoader, since Dataset only requires getitem, but Dataloader also does some preprocessing, although I think does not distinguish between training, validation ...
Also similar to https://docs.fast.ai/data.load.html no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The high-level view is this:
- DataSource is used to generate multiple datasets (e.g. train, test, val, predict)
- The preprocessing methods are stored in Preprocess
- When the dataloader is created, the preprocess transforms are injected into the workers and the model so that they are all called in the right place
So DataSource, Preprocess, DataPipeline is really just a different way of creating a DataSet and DataLoader (not a replacement). Can't speak to similarity with Fast AI as I'm not very familiar with it. Hope that helps!
If the library that your :class:`~flash.core.data.model.Task` is based on provides a custom dataset, you don't need to re-write it as a :class:`~flash.core.data.data_source.DataSource`. | ||
For example, the :meth:`~flash.core.data.data_source.DataSource.load_data` of the ``VideoClassificationPathsDataSource`` just creates an :class:`~pytorchvideo.data.EncodedVideoDataset` from the given folder. | ||
Here's how it looks (from `video/classification.data.py <https://github.com/PyTorchLightning/lightning-flash/blob/master/flash/video/classification/data.py>`_): | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we could give a simpler example for something like
https://archive.ics.uci.edu/ml/datasets/iris
I find the above example to have more code than needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all the changes! I think it looks great :)
Just a couple more nits
Co-authored-by: edenlightning <[email protected]>
Co-authored-by: edenlightning <[email protected]>
Co-authored-by: edenlightning <[email protected]>
What does this PR do?
Fixes # (issue)
Before submitting
PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.
Did you have fun?
Make sure you had fun coding 🙃