-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for IterableDatasets everywhere #1104
Add support for IterableDatasets everywhere #1104
Conversation
@@ -213,6 +213,48 @@ def test_dataloader(self): | |||
return self._dataloader(train=False) | |||
|
|||
|
|||
class CustomInfDataloader: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rather create complet Dataloader so it is easier to undestand... what about?
class CustomInfDataloader:
def __init__(self, dataset, batch_size, shuffle):
self.dataset = dataset
self.batch_size = batch_size
self.shuffle = shuffle
def __iter__(self):
idxs = []
while True:
if len(idxs) < self.batch_size:
idxs = range(len(self.dataset))
if self.shuffle:
np.random.shuffle(idxs)
batch = [self.dataset[idx] for idx in idxs[:self.batch_size]]
yield batch
idxs = idxs[len(batch):]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
torch.DataLoader
does quite a bit more than this (e.g. collate functions, samplers, etc.) so it is probably better to wrap it rather than rewrite it - also we don't really have access to the dataset when this is created, only the dataloader
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Borda we generally want to avoid duplicating torch functionality. Otherwise the project scope will blow up quickly,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do agree, I just found this construction quite difficult to follow...
hey there, we have added GPU CI test, so could we kindly ask to rebase/merge master which will trigger these tests so we do not need to test it manually... Thx for your understanding 🤖 |
@Borda Done :) |
* Add support for IterableDatasets everywhere * Added type hints, simplified code and improved coverage in data_loading.py * Update CHANGELOG.md
Before submitting
What does this PR do?
Fixes #948
PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.
Did you have fun?
Make sure you had fun coding 🙃