Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce ckpt_path="hpc" keyword for checkpoint loading #14911

Merged
merged 10 commits into from
Sep 29, 2022

Conversation

otaj
Copy link
Contributor

@otaj otaj commented Sep 27, 2022

What does this PR do?

Implements first two points from #13773 (comment)

Does your PR introduce any breaking changes? If yes, please list them.

HPC checkpoints are now NOT being loaded first as a default. Instead, they are specifically loaded if a user passes keyword "hpc". However, if a user is running with SLURMEnvironment and passes ckpt_path=None (which is a default value) and an HPC checkpoint is found, then it is automatically selected.

I'm not sure if we need more tests around this, if yes, I will implement them

Before submitting

  • Was this discussed/approved via a GitHub issue? (not for typos and docs)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure your PR does only one thing, instead of bundling different changes together?
  • Did you make sure to update the documentation with your changes? (if necessary)
  • Did you write any new necessary tests? (not for typos and docs)
  • Did you verify new and existing tests pass locally with your changes?
  • Did you list all the breaking changes introduced by this pull request?
  • Did you update the CHANGELOG? (not for typos, docs, test updates, or minor internal changes/refactors)

PR review

Anyone in the community is welcome to review the PR.
Before you start reviewing, make sure you have read the review guidelines. In short, see the following bullet-list:

  • Is this pull request ready for review? (if not, please submit in draft mode)
  • Check that all items from Before submitting are resolved
  • Make sure the title is self-explanatory and the description concisely explains the PR
  • Add labels and milestones (and optionally projects) to the PR so it can be classified

Did you have fun?

Make sure you had fun coding 🙃

cc @Borda @awaelchli @ananthsub @ninginthecloud @rohitgr7 @otaj @justusschock @akihironitta

@otaj otaj added feature Is an improvement or enhancement checkpointing Related to checkpointing environment: slurm breaking change Includes a breaking change labels Sep 27, 2022
@otaj otaj added this to the pl:1.8 milestone Sep 27, 2022
@otaj otaj requested a review from tchaton as a code owner September 27, 2022 17:48
@otaj otaj self-assigned this Sep 27, 2022
@github-actions github-actions bot added the pl Generic label for PyTorch Lightning package label Sep 27, 2022
@otaj otaj requested a review from justusschock as a code owner September 27, 2022 17:51
@mergify mergify bot added the ready PRs ready to be merged label Sep 27, 2022
@otaj
Copy link
Contributor Author

otaj commented Sep 29, 2022

Those pesky docstrings 😂. I updated them, @carmocca, let me know if something more should be said there.

@otaj otaj enabled auto-merge (squash) September 29, 2022 11:24
Copy link
Contributor

@tchaton tchaton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM !

@otaj otaj merged commit 5f0c4aa into master Sep 29, 2022
@otaj otaj deleted the feature/hpc_checkpoint_keyword branch September 29, 2022 12:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking change Includes a breaking change checkpointing Related to checkpointing environment: slurm feature Is an improvement or enhancement pl Generic label for PyTorch Lightning package ready PRs ready to be merged
Projects
No open projects
Status: Done
Development

Successfully merging this pull request may close these issues.

4 participants