-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Introduce stateful dataloader (alternative) #2991
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
Thanks! I took a quick look and left a few comments. The only reason I didn't immediately go for pure composition was I was worried breaking something if class inheritance structure is changed. I believe If that's not an issue then either I could refactor my PR to be closer to yours, or we could iterate on this PR instead. It's up to you. Also, I did make some unit tests in PR #2895, in case you could reuse them and save you time from writing your own tests. |
|
I haven't followed the whole history of this feature. @muellerzr is the issue mentioned by @byi8220 addressed that this PR would change the inheritance structure of data loaders and could potentially break existing code that checks |
|
@byi8220 all of those checks are whether we should preprocess someones existing dataloader using |
|
If there's some particular |
|
@BenjaminBossan Yes. This PR overall LGTM, and the main issues I thought of were:
Yeah, that was the original idea in the previous PR, which is pretty hacky.
Makes sense. In this case, a
There was nothing in particular I could think of, breaking class structure just felt fishy to me. |
src/accelerate/data_loader.py
Outdated
|
|
||
|
|
||
| class DataLoaderShard(DataLoader, DataLoaderStateMixin): | ||
| class DataLoaderShard(DataLoaderStateMixin): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we 100% okay with changing this? Composition is the better way to do this, but the only issues I could see are existing checks for isinstance(dl, DataLoader) breaking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would something like this work? StatefulDataLoader is a subclass of torch.utils.data.DataLoader
| class DataLoaderShard(DataLoaderStateMixin): | |
| _BaseClass = torch.utils.data.DataLoader | |
| if is_torchdata_available(): | |
| _BaseClass = torchdata.stateful_dataloader.StatefulDataLoader | |
| class DataLoaderShard(_BaseClass, DataLoaderStateMixin): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, because this assumes users are blanket using the stateful_dataloader if torchdata is available. We do not want this, it must be configurable by the user
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do users call the DataLoaderShard constructor directly? If it was a factory method it'd be possible to switch out the class based on config but guessing that's not an option, something like StatefulDataLoaderShard and users create that instead
src/accelerate/data_loader.py
Outdated
|
|
||
| self.set_epoch(self.iteration) | ||
| dataloader_iter = super().__iter__() | ||
| dataloader_iter = iter(self._dataloader) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there is an issue which prevents you from simply just delegating state_dict and load_state_dict.
The dataloader_iter is actually one ahead of what we are going to yield. dataloader_iter is always pointing to next_batch, not current_batch.
I got around it in my PR by using a _save_state_dict function to hold the previous value of state_dict. I also wrote some unit tests to catch this issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, this is because there's a pre-fetch to send to device
src/accelerate/data_loader.py
Outdated
| def __getattr__(self, name): | ||
| # Delegate attribute access to the internal instance | ||
| return getattr(self._dataloader, name) | ||
|
|
||
| def __len__(self): | ||
| return len(self._dataloader) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a lot cleaner than my approach, but it still feels a bit magical. Are we sure len is the only builtin that needs to be supported?
|
Given the fact that the Trainer requires a raw Going back to the drawing board with the torch team, as no solutions feel right. |
To confirm, this is blocked on a hard inheritance requirement? And all current solutions feel extremely ugly?
Hm, might be overkill but would it be sufficient to ask the torch team to separate DataLoader to an interface and a concrete, then have the accelerate derivatives implement the interface? Feels a bit bloated, but feels like a canonical solution. |
|
@byi8220 @BenjaminBossan and I were thinking on similar lines |
What does this PR do?
This PR is an alternative to #2895 which uses composition in the end to be a pinch less "magical". Since we're also using composition, if we eventually want any "base iterable" type of loader to be compatible/use these methods, as long as the underlying assumption is they behave like
DataLoader's, this logic could scale.Fixes #2859
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
@SunMarc @BenjaminBossan @byi8220