Skip to content
This repository has been archived by the owner on Oct 9, 2023. It is now read-only.

download_data from flash.core.data.utils connects to the internet before checking if a file exists #1611

Closed
surak opened this issue Jun 20, 2023 · 1 comment · Fixed by #1666
Labels
bug / fix Something isn't working help wanted Extra attention is needed

Comments

@surak
Copy link

surak commented Jun 20, 2023

🐛 Bug

In many supercomputers, process of running ML codes is to first run the download part on the login nodes (which have access to the internet), and stop the code right before the actual training starts.

Then, when you run on the compute nodes (the ones with the actual gpus and no internet access), you let the code run to the end. In other frameworks, data downloaders detect the files' presence and skip it before ever trying to connect to the internet.

Flash tries first to check file size in this line, which will freeze in a machine without internet.

To Reproduce

call "download_data" on a machine with no internet access

Code sample

from flash.core.data.utils import download_data
download_data("https://pl-flash-data.s3.amazonaws.com/hymenoptera_data.zip", "data/")

Expected behavior

If the file is already there, skip download

Environment

  • OS (e.g., Linux): Centos 8.6
  • Python version: 3.10
  • PyTorch/Lightning/Flash Version (e.g., 1.10/1.5/0.7): Pytorch 1.12.1, Lightning 0.8.4, Flash 0.8.1.post0
  • GPU models and configuration: 16x A100 40GB
  • Any other relevant information:

Additional context

Fast.ai's fastdownload, for example, does not suffer from this - if the file is there, it doesn't try to download it, even if it's of the wrong size: fastdownload link

@surak surak added bug / fix Something isn't working help wanted Extra attention is needed labels Jun 20, 2023
@Borda
Copy link
Member

Borda commented Jun 30, 2023

Nice catch. 🕵️ Would you be interested in sending a fix? 🐰

@Borda Borda changed the title download_data from flash.core.data.utils connects to the internet before checking if a file exists download_data from flash.core.data.utils connects to the internet before checking if a file exists Aug 9, 2023
@Borda Borda changed the title download_data from flash.core.data.utils connects to the internet before checking if a file exists download_data from flash.core.data.utils connects to the internet before checking if a file exists Aug 9, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug / fix Something isn't working help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants