Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Downloading MNIST fails due to Cloudflare protection #63

Closed
Steven-Adriaensen opened this issue Mar 10, 2021 · 4 comments
Closed

Downloading MNIST fails due to Cloudflare protection #63

Steven-Adriaensen opened this issue Mar 10, 2021 · 4 comments
Labels
bug Something isn't working sgd

Comments

@Steven-Adriaensen
Copy link
Contributor

Automatically downloading MNIST currently fails (HTTP Error 403) as Cloudflare blocks requests using the default urllib headers.

Issue & solution:
https://stackoverflow.com/questions/60548000/getting-http-error-403-forbidden-error-when-download-mnist-dataset

@Steven-Adriaensen Steven-Adriaensen added bug Something isn't working sgd labels Mar 10, 2021
@TheEimer
Copy link
Contributor

We could potentially also just update torchvision as I at least had issues with these fixes. So we should investigate if that would cause issues within the SGD benchmark.

@maximilianreimer
Copy link
Contributor

I am currently investigating this bug and there are three things that bother me:

  1. With fixes currently in the code it does not seem to work for me
  2. I added an alternative fix, suggested in the issue above, and it seems to work at least sometimes but not always. (see listing below)
  3. Even when I tested download the dataset with the current torch and torchvision version just using this train_dataset = datasets.MNIST('../data', train=True, download=True) I got the same error.

I get the feeling that you are getting blocked if you are trying to download it too often. Maybe we should make sure that in our test environment the file is cached over multiple runs.

import urllib

opener = urllib.request.build_opener()
opener.addheaders = [("User-agent", "Mozilla/5.0")]
urllib.request.install_opener(opener)

train_dataset = datasets.MNIST('../data', train=True, download=True)

@TheEimer
Copy link
Contributor

I tried both solutions and torch (which is supposed to work with the current version according to their GitHub issues), so I think trying to cache it would be best. Everything else seems like a lot of trouble that's very hard to debug.

@maximilianreimer
Copy link
Contributor

Apparently, it has only been fixed in the nightly build but will be incorporated in the next minor release (See Pytorch Issue 3549. I added a hotfix for now.

maximilianreimer added a commit that referenced this issue Mar 26, 2021
#63
This will be fixed in the next release of torch vision.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working sgd
Projects
None yet
Development

No branches or pull requests

3 participants