-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Save ModelCheckpoint's last.ckpt
as symlink if possible
#18748
Conversation
last.ckpt
as a symlink in ModelCheckpointlast.ckpt
as symlink
last.ckpt
as symlinklast.ckpt
as symlink if possible
…feature/save-last-symlink
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## master #18748 +/- ##
==========================================
- Coverage 83% 49% -34%
==========================================
Files 439 431 -8
Lines 34469 34324 -145
==========================================
- Hits 28706 16871 -11835
- Misses 5763 17453 +11690 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR shouldn't close #4335 which advocates for splitting the ModelCheckpoint
class into smaller pieces with separate functionality. Instead of the current mess of all flags interacting with each other in a single class
Co-authored-by: Carlos Mocholí <[email protected]>
for more information, see https://pre-commit.ci
What does this PR do?
Fixes #18670
Fixes #14973
Part of #4335
This PR changes the ModelCheckpoint's behavior when
save_last=True
:save_top_k != 0
, andsave_last=True
, thenlast.ckpt
will be a symlink to the latest top-k checkpointsave_top_k == 0
, thenlast.ckpt
remains a regular checkpointThis improves the user experience of having a deterministic file name to load the last checkpoint. LLM checkpoints can be > 100 GB, and saving a copy everytime is not only time consuming but also wasting disk space.
cc @Borda @carmocca @awaelchli