Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Train] Update checkpoint path for RayTrainReportCallbacks. #40174

Merged

Conversation

woshiyyya
Copy link
Member

@woshiyyya woshiyyya commented Oct 6, 2023

Why are these changes needed?

In our doc, we didn't explicitly explain how to retrieve the original checkpoint files if using Ray Train provided report callbacks. This PR add some code snippets for users.

Related issue number

Closes #40082

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Signed-off-by: woshiyyya <[email protected]>
Signed-off-by: woshiyyya <[email protected]>
@woshiyyya woshiyyya marked this pull request as ready for review October 7, 2023 00:14
@woshiyyya woshiyyya changed the title [Train] Update user guides for RayTrainReportCallbacks. [Train] Update checkpoint path for RayTrainReportCallbacks. Oct 10, 2023
@woshiyyya woshiyyya added the tests-ok The tagger certifies test failures are unrelated and assumes personal liability. label Oct 11, 2023
doc/source/train/doc_code/checkpoints.py Outdated Show resolved Hide resolved
doc/source/train/doc_code/checkpoints.py Outdated Show resolved Hide resolved
python/ray/train/lightning/_lightning_utils.py Outdated Show resolved Hide resolved
python/ray/train/lightning/_lightning_utils.py Outdated Show resolved Hide resolved
woshiyyya and others added 3 commits October 11, 2023 20:09
Co-authored-by: matthewdeng <[email protected]>
Signed-off-by: Yunxuan Xiao <[email protected]>
Signed-off-by: woshiyyya <[email protected]>
Signed-off-by: woshiyyya <[email protected]>
Comment on lines 226 to 231
Checkpoints will be saved in the following structure:

.. testcode::

# checkpoint_00000*/ Ray Train Checkpoint
# └─ checkpoint.ckpt PyTorch Lightning Checkpoint
Copy link
Contributor

@matthewdeng matthewdeng Oct 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you try this:

Suggested change
Checkpoints will be saved in the following structure:
.. testcode::
# checkpoint_00000*/ Ray Train Checkpoint
# └─ checkpoint.ckpt PyTorch Lightning Checkpoint
Checkpoints will be saved in the following structure::
checkpoint_00000*/ Ray Train Checkpoint
└─ checkpoint.ckpt PyTorch Lightning Checkpoint

https://www.sphinx-doc.org/en/master/usage/restructuredtext/basics.html#literal-blocks

@matthewdeng matthewdeng merged commit 89eb6da into ray-project:master Oct 17, 2023
29 of 41 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tests-ok The tagger certifies test failures are unrelated and assumes personal liability.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[train] HuggingFace transformers RayTrainReportCallback saves contents under a nested checkpoint folder
2 participants