-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add disk usage check before downloading files #19041
Conversation
|
||
while (usage.free / 1000 / 1000 / 1000) <= threshold_in_gb: | ||
sleep(sleep_time) | ||
usage = shutil.disk_usage("/") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should probably check the disk usage with the path where we are writing to. E.g. instead of /
it should be the "cache dir" probably.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is safer to check the overall machine state to avoid any failures as this would be a costly mistake.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My argument is that it would be a costly mistake not to measure the disk usage where we are saving. For example, if your /home is not on the same disk as /. The function could simply take a path as input, like
_wait_for_disk_usage_higher_than_threshold(input_dir.path)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The root
should include everything on the machine, so it would independent on wherever is home.
|
||
while (usage.free / 1000 / 1000 / 1000) <= threshold_in_gb: | ||
sleep(sleep_time) | ||
usage = shutil.disk_usage("/") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My argument is that it would be a costly mistake not to measure the disk usage where we are saving. For example, if your /home is not on the same disk as /. The function could simply take a path as input, like
_wait_for_disk_usage_higher_than_threshold(input_dir.path)
Co-authored-by: thomas <[email protected]> (cherry picked from commit d3df127)
Co-authored-by: thomas <[email protected]> (cherry picked from commit d3df127)
What does this PR do?
This PR fixes a bug where files would be downloaded until there is no space left of the machine. This happens if the processing and removers aren't fast enough to catch up with the downloaders.
Fixes #<issue_number>
Before submitting
PR review
Anyone in the community is welcome to review the PR.
Before you start reviewing, make sure you have read the review guidelines. In short, see the following bullet-list:
Reviewer checklist
📚 Documentation preview 📚: https://pytorch-lightning--19041.org.readthedocs.build/en/19041/
cc @Borda @carmocca