-
Notifications
You must be signed in to change notification settings - Fork 6.8k
improve dataloader signals and messages #16114
Conversation
@szha, @eric-haibin-lin @sxjscience for review |
ping for review 😄 |
Thanks @zhreshold! I wonder if adding a default timeout is a backwards incompatible change that affects our semantic versioning guarantee? (some user code may currently run 130 seconds per batch, and it will stop working when users upgrade to MXNet 1.6) If I understand correctly, the motivation to introduce timeout is:
|
@leezu the timeout is for dataloader workers, not including the network training on the main thread. Is there any use case where each batch on cpu can take up to 2min? |
I don't think it's a common or intended use-case to have workers process a batch for more than 120 seconds. It's still a breaking change, but it may be fine to break a feature that's potentially unused (ie noone may rely on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's reasonable to assume that the loading time of a batch is less than 120 seconds.
I have a suggestion: Dataloader does not terminate the program but print a warning when timeout. Users decide whether to terminate it. |
@wkcn It's mandatory to have timeout in order to catch excetions in subprocess due to a python bug. |
fa5052b
to
76ca2cc
Compare
fe975d4
to
aea651d
Compare
* improve dataloader signals and messages * address comments * fix spawn tests on windows
Description
Improve dataloader use experience.
With this PR, dataloaders are
Checklist
Essentials
Please feel free to remove inapplicable items for your PR.
Changes
Comments