Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Caching MNIST dataset for testing #917

Merged

Conversation

djbyrne
Copy link
Contributor

@djbyrne djbyrne commented Feb 23, 2020

  • Added MNIST datset to the tests directory

  • Caches dataset based off hash of the test.pt file

Before submitting

  • Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
  • Did you read the contributor guideline?
  • Did you make sure to update the docs?
  • Did you write any new necessary tests?
  • If you made a notable change (that affects users), did you update the CHANGELOG?

What does this PR do?

Fixes #859 .
Caches the MNIST dataset. This will prevent the CI/CD testing from crashing when the link to the original dataset is unavailable and will speed up CI/CD as the pipeline will not have to re-download files.

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

Donal Byrne added 3 commits February 23, 2020 10:11
* Added MNIST datset to the tests directory

* Caches dataset based off hash of the test.pt file
@djbyrne djbyrne requested a review from Borda February 23, 2020 12:45
@awaelchli
Copy link
Contributor

is it not possible to download it once and reuse? otherwise the package will grow by 100+ MB for only the tests and most users don't want that.

@williamFalcon
Copy link
Contributor

thanks for the PR! we can’t add a dataset to the framework. @Borda

@Borda
Copy link
Member

Borda commented Feb 23, 2020

We are not adding them to package, it is just in CI test time, e.g. running PRs... Let review the PR =)

@williamFalcon
Copy link
Contributor

i see binaries in the /test folder

Copy link
Member

@Borda Borda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I see now, there is no need to add the dataset to git, torchvision check if it has it in temp and if not, then it is downloaded...
This PR shall contain only changes in Github action config

@Borda Borda added ci Continuous Integration feature Is an improvement or enhancement need fix labels Feb 23, 2020
@Borda Borda changed the title Caching MNIST dataset for testing #859 Caching MNIST dataset for testing Feb 23, 2020
@Borda
Copy link
Member

Borda commented Feb 23, 2020

i see binaries in the /test folder

Yeah the was probably misunderstanding when we talked about caching lol

@djbyrne
Copy link
Contributor Author

djbyrne commented Feb 23, 2020

i see binaries in the /test folder

Yeah the was probably misunderstanding when we talked about caching lol

@Borda apologies, I misunderstood the request. Ill remove the MNIST files from the directory.

@djbyrne
Copy link
Contributor Author

djbyrne commented Feb 23, 2020

It looks like the caching of the pip files is only being done for Linux. Should I add in caching of the locations for Windows and MacOS as well?

@williamFalcon
Copy link
Contributor

@Borda good to go?

@williamFalcon
Copy link
Contributor

@djbyrne can you subclass MNIST dataset for tests only and add the urls we have?

@djbyrne
Copy link
Contributor Author

djbyrne commented Feb 24, 2020

@williamFalcon this is just to override the resources list of links to the dataset right. This is different to the request to have an MNIST class decoupled from torchvision that @Borda mentioned?

@Borda
Copy link
Member

Borda commented Feb 24, 2020

Pls do not remove the blank lines, then it shall be fine for Linux tests
It would be nice to have the caching also for other platforms...

Copy link
Member

@Borda Borda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🚀

@williamFalcon williamFalcon added this to the 0.6.1 milestone Feb 25, 2020
@Borda Borda added ready PRs ready to be merged and removed need fix labels Feb 25, 2020
@Borda Borda modified the milestones: 0.6.1, 0.6.2 Feb 25, 2020
@williamFalcon williamFalcon merged commit 9854084 into Lightning-AI:master Feb 25, 2020
@Borda Borda mentioned this pull request Feb 25, 2020
@Borda Borda modified the milestones: 0.7.1, 0.7.0 Feb 27, 2020
tullie pushed a commit to tullie/pytorch-lightning that referenced this pull request Apr 3, 2020
* Caching MNIST dataset for testing

* Added MNIST datset to the tests directory

* Caches dataset based off hash of the test.pt file

* Cleaned Up yml file

* Cleaned Up yml file

* Removed MNIST Data from framework

* Set cache key for dataset to 'mnist'

* Apply suggestions from code review

* Apply suggestions from code review

Co-authored-by: Jirka Borovec <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci Continuous Integration feature Is an improvement or enhancement ready PRs ready to be merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

test fails with _lazy_train_dataloader
4 participants