Add support for IterableDatasets everywhere #1104

ethanwharris · 2020-03-09T16:14:45Z

Before submitting

Was this discussed/approved via a Github issue? (no need for typos and docs improvements)
Did you write any new necessary tests?

What does this PR do?

Fixes #948

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

pytorch_lightning/trainer/data_loading.py

tests/models/mixins.py

Borda · 2020-03-09T23:19:50Z

tests/models/mixins.py

@@ -213,6 +213,48 @@ def test_dataloader(self):
        return self._dataloader(train=False)


+class CustomInfDataloader:


I would rather create complet Dataloader so it is easier to undestand... what about?

class CustomInfDataloader: def __init__(self, dataset, batch_size, shuffle): self.dataset = dataset self.batch_size = batch_size self.shuffle = shuffle def __iter__(self): idxs = [] while True: if len(idxs) < self.batch_size: idxs = range(len(self.dataset)) if self.shuffle: np.random.shuffle(idxs) batch = [self.dataset[idx] for idx in idxs[:self.batch_size]] yield batch idxs = idxs[len(batch):]

torch.DataLoader does quite a bit more than this (e.g. collate functions, samplers, etc.) so it is probably better to wrap it rather than rewrite it - also we don't really have access to the dataset when this is created, only the dataloader

@Borda we generally want to avoid duplicating torch functionality. Otherwise the project scope will blow up quickly,

I do agree, I just found this construction quite difficult to follow...

pytorch_lightning/trainer/data_loading.py

…ng.py

Borda · 2020-03-11T20:42:12Z

hey there, we have added GPU CI test, so could we kindly ask to rebase/merge master which will trigger these tests so we do not need to test it manually... Thx for your understanding 🤖

ethanwharris · 2020-03-12T12:17:23Z

@Borda Done :)

* Add support for IterableDatasets everywhere * Added type hints, simplified code and improved coverage in data_loading.py * Update CHANGELOG.md

Add support for IterableDatasets everywhere

d97ed43

ethanwharris requested a review from a team March 9, 2020 16:14

Borda reviewed Mar 9, 2020

View reviewed changes

Borda added this to the 0.7.2 milestone Mar 9, 2020

Borda added the feature Is an improvement or enhancement label Mar 9, 2020

Added type hints, simplified code and improved coverage in data_loadi…

326255f

…ng.py

Borda mentioned this pull request Mar 11, 2020

Extend docs with multiple dataloader with common cases #1089

Closed

ethanwharris added 2 commits March 12, 2020 12:08

Merge branch 'master' into feature/iterable_datasets

5c00fe0

Update CHANGELOG.md

4dc2643

ethanwharris added the ready PRs ready to be merged label Mar 12, 2020

williamFalcon merged commit 2b3f443 into Lightning-AI:master Mar 12, 2020

ethanwharris deleted the feature/iterable_datasets branch March 12, 2020 17:40

Borda modified the milestones: v0.7., v0.7.x Apr 18, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for IterableDatasets everywhere #1104

Add support for IterableDatasets everywhere #1104

ethanwharris commented Mar 9, 2020

Borda Mar 9, 2020

ethanwharris Mar 10, 2020 •

edited

Loading

williamFalcon Mar 12, 2020

Borda Mar 12, 2020

Borda commented Mar 11, 2020

ethanwharris commented Mar 12, 2020

		@@ -213,6 +213,48 @@ def test_dataloader(self):
		return self._dataloader(train=False)


		class CustomInfDataloader:

Add support for IterableDatasets everywhere #1104

Add support for IterableDatasets everywhere #1104

Conversation

ethanwharris commented Mar 9, 2020

Before submitting

What does this PR do?

PR review

Did you have fun?

Borda Mar 9, 2020

Choose a reason for hiding this comment

ethanwharris Mar 10, 2020 • edited Loading

Choose a reason for hiding this comment

williamFalcon Mar 12, 2020

Choose a reason for hiding this comment

Borda Mar 12, 2020

Choose a reason for hiding this comment

Borda commented Mar 11, 2020

ethanwharris commented Mar 12, 2020

ethanwharris Mar 10, 2020 •

edited

Loading