-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
remove redundant iterator call to data fetcher in loops #9117
Conversation
Hey @awaelchli. Thanks, definitely a good catch ! |
Can we add a test to prevent regression ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGMT !
Hey @awaelchli, added the same fix for evaluation loop. Thanks for finding this out. |
Codecov Report
@@ Coverage Diff @@
## master #9117 +/- ##
=======================================
- Coverage 92% 92% -0%
=======================================
Files 176 176
Lines 14666 14865 +199
=======================================
+ Hits 13500 13677 +177
- Misses 1166 1188 +22 |
Hey @awaelchli. Same here. Hopefully, your test can used to prevent regression. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Making evaluation_loop depend on fetcher doesn't seem needed to me -- do we need anything other than an iterator proper?
if not isinstance(dataloader_iter, DataLoaderIterDataFetcher): | ||
dataloader_iter = enumerate(dataloader_iter, batch_idx) | ||
# restore iteration | ||
if not isinstance(data_fetcher, DataLoaderIterDataFetcher): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not simply move this into the contract of this iterator? basically make the contract such that these data fetchers will always return the batch_idx as part of it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, I suppose could be done like that :)
@@ -105,9 +105,11 @@ def _process_training_step_output( | |||
return results, hiddens | |||
|
|||
|
|||
def _prepare_dataloader_iter(dataloader_iter: Iterator, batch_idx: int) -> Iterator: | |||
def _prepare_dataloader_iter(data_fetcher: AbstractDataFetcher, batch_idx: int) -> Iterator: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How much do we actually need to expose the "AbstractDataFetcher" here versus just an iterator? It seems to me that we're really not adding much value by exposing a new interface here.
The custom logic below really should be part of the iterator implementation itself as i comment below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, the iterator type annotation would have sufficed here
What does this PR do?
#9052 moved the iterator management from the fit loop to the training epoch loop.
the dataloader gets called with
iter()
but later on also withenumerate()
on top of it. this will not play nicely in #8950 where it's important to track the state of an iterator.This PR removes the additional
iter()
call.A full integration test for iterator state is present in #8950 .
Before submitting
NOTNEEDED since the change linked above was only introduced in master recently.
PR review
Anyone in the community is free to review the PR once the tests have passed.
Before you start reviewing make sure you have read Review guidelines. In short, see the following bullet-list:
Did you have fun?
I made sure I had fun coding 🙃