Clean up dataloader logic #926

williamFalcon · 2020-02-24T15:43:47Z

Fixes #928
Fixes #927
Fixes #922
Fixes #909
Fixes #859
Fixes #902

Removes data_decorator

# old
@data_loader
def train_data(...):

# new 
def train_data(...):

Adds prepare_data

Lightning needs a step to download data on proc 0 only

def prepare_data():
  # do actual downloads

def train_data(...):
    # return dataloader

Added new flags

# progress bar fast refresh freezes notebooks. here we throttle it
Trainer(progress_bar_refresh_rate=50)

# allow user to reload dataset each epoch
Trainer(reload_dataloaders_every_epoch=False)

Fixes .fit with data

The .fit(dataloaders) was buggy. Simplified it to just hook into the rest of the framework instead of its own adhoc process.

Automatic sampler

Now user doesn't have to mess around with samplers on DDP or TPUs. Lightning sets it up automatically.

pep8speaks · 2020-02-24T21:34:29Z

Hello @williamFalcon! Thanks for updating this PR.

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-02-25 03:14:52 UTC

jeremyjordan · 2020-02-25T03:41:53Z

pytorch_lightning/trainer/data_loading.py

+        dl_args = {
+            'dataset': dataloader.dataset,
+            'batch_size': dataloader.batch_size,
+            'shuffle': False,


what if a user wants to shuffle batches (when running on a single machine)? i see below that in certain cases you're re-setting this value to False, did you intend to have it set to True here?

I would rather move the shuffle to arguments as the others are taken from dataloader and only this is fixed

Borda

it is really huge so it is just my quick comments...

Borda · 2020-02-25T08:07:26Z

pl_examples/basic_examples/lightning_module_template.py

+    def prepare_data(self):
+        transform = transforms.Compose([transforms.ToTensor(),
+                                        transforms.Normalize((0.5,), (1.0,))])
+        dataset = MNIST(root=self.hparams.data_root, train=True,


Borda · 2020-02-25T08:08:39Z

pytorch_lightning/core/decorators.py

@@ -1,5 +1,6 @@
 import traceback


add warning also here

Borda · 2020-02-25T08:09:26Z

pytorch_lightning/core/lightning.py

+    XLA_AVAILABLE = True
+
+except ImportError:
+    XLA_AVAILABLE = False


rather

try: import torch_xla.core.xla_model as xm except ImportError: XLA_AVAILABLE = False else: XLA_AVAILABLE = True

Borda · 2020-02-25T08:25:13Z

pytorch_lightning/trainer/data_loading.py

+        dl_args = {
+            'dataset': dataloader.dataset,
+            'batch_size': dataloader.batch_size,
+            'shuffle': False,


I would rather move the shuffle to arguments as the others are taken from dataloader and only this is fixed

Borda · 2020-02-25T08:25:58Z

pytorch_lightning/trainer/data_loading.py

+        if train:
+            if self.use_ddp or self.use_ddp2:
+                sampler = DistributedSampler(dataloader.dataset)
+                dl_args['shuffle'] = False


why this if it is already fixed as false?

Borda · 2020-02-25T08:40:52Z

pytorch_lightning/trainer/data_loading.py

-                        warnings.warn(msg)
-                    break
-
-    def init_test_dataloader(self, model):


I guess that this can be simply unified ass the content is almost the same

Borda · 2020-02-25T08:48:06Z

pytorch_lightning/trainer/trainer.py

+    def __set_fit_dataloaders(self, model, train_dataloader, val_dataloaders, test_dataloaders):
+        # when dataloader is passed via fit, patch the train_dataloader
+        # functions to overwrite with these implementations
+        if train_dataloader is not None:


this may be unified...

Borda · 2020-02-25T08:49:19Z

tests/models/base.py

+    def prepare_data(self):
+        transform = transforms.Compose([transforms.ToTensor(),
+                                        transforms.Normalize((0.5,), (1.0,))])
+        dataset = TestingMNIST(root=self.hparams.data_root, train=True,


Borda · 2020-02-25T08:50:33Z

tests/models/mixins.py

+
+        # acc
+        labels_hat = torch.argmax(y_hat, dim=1)
+        test_acc = torch.sum(y == labels_hat).item() / (len(y) * 1.0)


isn't it tensor already here

Borda · 2020-02-25T08:51:45Z

tests/models/mixins.py

+            return output
+
+
+class LightningTestFitMultipleTestDataloadersMixin:


it is not easy to see, what is the difference to LightningTestFitSingleTestDataloadersMixin

versatran01 · 2020-02-26T15:44:50Z

I still feel this puts too much restriction on the data loader.
A data loader is a very abstract thing, and all we ask from it is the next batch of data. or maybe the size of the dataset.
The current implementation assumes this is a pytorch dataloader and try to access dataloader.batch_size from it.

williamFalcon · 2020-02-26T15:54:00Z

i don’t disagree. maybe a good approach is to check that it’s a pytorch dataloader? what other dataloaders are there?

versatran01 · 2020-02-26T15:56:35Z

What I meant is that lightning should not touch the data loader that the user provides unless necessary.
In the current master branch, the reset_train_dataloader() tries to create a new torch data loader from the given one. And it tries to call auto_add_sampler(), which says it will not do anything if there is a sampler but for some reason, it doesn't. And it will try to create a new dataloader out of the given one. And I cannot find a way to disable it.

https://github.com/PyTorchLightning/pytorch-lightning/blob/be244560b24b68b0236a4694707fb9bb63c2e6d0/pytorch_lightning/trainer/data_loading.py#L92

I can open up an issue if you think that's a better place for this discussion.

williamFalcon · 2020-02-26T16:03:55Z

we could make this a method you can override in lightning module.

what use case do you need to maintain the original loader?

we could also use a flag in the trainer:
auto_add_sampler=True by default

versatran01 · 2020-02-26T16:13:26Z

I have multiple dataloaders that each loads images in order.
And I have my own sampler that makes sure the loaded batch is in order but the batch themselves could be random.
I then use itertools to chain these loaders so that they look like one loader.
This works fine until this pr.
Basically all I need is for lightning to not touch my loader when I don't need fancy features like tpu or ddp and stuff.
What's your suggestion?

The auto_add_sampler functions says it shouldn't do anything when user provides a sampler, we should at least fix this part.

ethanwharris · 2020-02-26T16:14:37Z

We're currently adressing this in the fix for #953 - will PR soon.

The solution is to re-write the auto_add_sampler method to only create a dataloader when doing DDP, DDP2 or TPU - and just return the given dataloader otherwise.

versatran01 · 2020-02-26T16:15:59Z

@ethanwharris I will repost my concerns there then. Thanks for the pointer.

Borda · 2020-02-26T16:55:35Z

@versatran01 that would be cool, looking forward to your points 🤖

* added get dataloaders directly using a getter * deleted decorator * added prepare_data hook * refactored dataloader init * refactored dataloader init * added dataloader reset flag and main loop * added dataloader reset flag and main loop * added dataloader reset flag and main loop * made changes * made changes * made changes * made changes * made changes * made changes * made changes * made changes * made changes * made changes * made changes * made changes * made changes * made changes * made changes * made changes * made changes * made changes * made changes * made changes * made changes * made changes * made changes * made changes * made changes * made changes * made changes * made changes * made changes * made changes * made changes * made changes * made changes * made changes * made changes * made changes * made changes * made changes * made changes * made changes * made changes * made changes * made changes * made changes * made changes * made changes * fixed bad loaders * fixed bad loaders * fixed bad loaders * fixed bad loaders * fixed bad loaders * fixed bad loaders * fixed bad loaders * fixed bad loaders * fixed bad loaders * fixed error in .fit with loaders * fixed error in .fit with loaders * fixed error in .fit with loaders * fixed error in .fit with loaders * fixed error in .fit with loaders * fixed error in .fit with loaders * fixed error in .fit with loaders * fixed error in .fit with loaders * fixed error in .fit with loaders * fixed error in .fit with loaders * fixed error in .fit with loaders * fixed error in .fit with loaders * fixed error in .fit with loaders * fixes Lightning-AI#909 * fixes Lightning-AI#909 * bug fix * Fixes Lightning-AI#902

williamFalcon added 22 commits February 24, 2020 10:38

added get dataloaders directly using a getter

15bfd14

deleted decorator

6c7a37e

added prepare_data hook

38e6991

refactored dataloader init

37be7e7

refactored dataloader init

db1115e

added dataloader reset flag and main loop

dbe3fc0

added dataloader reset flag and main loop

bb642b8

added dataloader reset flag and main loop

6d646fd

made changes

412899c

made changes

252713f

made changes

de77ece

made changes

f723574

made changes

9009c22

made changes

74984e3

made changes

838879b

made changes

14f3a1d

made changes

93f6b19

made changes

b021212

made changes

45db9be

made changes

20b5c62

made changes

189bbb1

made changes

0a45b1a

williamFalcon added 7 commits February 24, 2020 16:38

made changes

767ad23

made changes

56c0654

made changes

783b5c7

made changes

8183e82

made changes

2a2a6ef

made changes

8347a18

made changes

176d62d

williamFalcon added 11 commits February 24, 2020 21:12

fixed error in .fit with loaders

6c412ed

fixed error in .fit with loaders

6287dd6

fixed error in .fit with loaders

dc403e8

fixed error in .fit with loaders

b2755f9

fixed error in .fit with loaders

1be1cf6

fixed error in .fit with loaders

83869c2

fixed error in .fit with loaders

6bc9587

fixed error in .fit with loaders

66f55d7

fixes #909

05ad57d

fixes #909

9504461

bug fix

c12cb92

williamFalcon changed the title ~~Dataloaders [WIP]~~ Dataloaders Feb 25, 2020

williamFalcon requested review from Borda and ethanwharris February 25, 2020 03:03

williamFalcon changed the title ~~Dataloaders~~ Clean up dataloader logic Feb 25, 2020

Fixes #902

3173ad3

williamFalcon merged commit 1015a00 into master Feb 25, 2020

williamFalcon deleted the dataloaders branch February 25, 2020 03:31

jeremyjordan reviewed Feb 25, 2020

View reviewed changes

Borda reviewed Feb 25, 2020

View reviewed changes

Borda added the feature Is an improvement or enhancement label Mar 3, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clean up dataloader logic #926

Clean up dataloader logic #926

williamFalcon commented Feb 24, 2020 •

edited

Loading

pep8speaks commented Feb 24, 2020 •

edited

Loading

jeremyjordan Feb 25, 2020

Borda Feb 25, 2020

Borda left a comment

Borda Feb 25, 2020

Borda Feb 25, 2020

Borda Feb 25, 2020

Borda Feb 25, 2020

Borda Feb 25, 2020

Borda Feb 25, 2020

Borda Feb 25, 2020

Borda Feb 25, 2020

Borda Feb 25, 2020

Borda Feb 25, 2020

versatran01 commented Feb 26, 2020

williamFalcon commented Feb 26, 2020

versatran01 commented Feb 26, 2020 •

edited

Loading

williamFalcon commented Feb 26, 2020

versatran01 commented Feb 26, 2020 •

edited

Loading

ethanwharris commented Feb 26, 2020 •

edited

Loading

versatran01 commented Feb 26, 2020

Borda commented Feb 26, 2020

		return output


		class LightningTestFitMultipleTestDataloadersMixin:

Clean up dataloader logic #926

Clean up dataloader logic #926

Conversation

williamFalcon commented Feb 24, 2020 • edited Loading

Removes data_decorator

Adds prepare_data

Added new flags

Fixes .fit with data

Automatic sampler

pep8speaks commented Feb 24, 2020 • edited Loading

Comment last updated at 2020-02-25 03:14:52 UTC

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Borda left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

versatran01 commented Feb 26, 2020

williamFalcon commented Feb 26, 2020

versatran01 commented Feb 26, 2020 • edited Loading

williamFalcon commented Feb 26, 2020

versatran01 commented Feb 26, 2020 • edited Loading

ethanwharris commented Feb 26, 2020 • edited Loading

versatran01 commented Feb 26, 2020

Borda commented Feb 26, 2020

williamFalcon commented Feb 24, 2020 •

edited

Loading

pep8speaks commented Feb 24, 2020 •

edited

Loading

versatran01 commented Feb 26, 2020 •

edited

Loading

versatran01 commented Feb 26, 2020 •

edited

Loading

ethanwharris commented Feb 26, 2020 •

edited

Loading