Skip to content

Docs: Add note about version counter in ModelCheckpoint #20146

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Aug 4, 2024
9 changes: 9 additions & 0 deletions docs/source-pytorch/common/checkpointing_intermediate.rst
Original file line number Diff line number Diff line change
Expand Up @@ -83,6 +83,15 @@ Which
filename="sample-mnist-{epoch:02d}-{global_step}",
)

.. note::

It is recommended that you pass formatting options to ``filename`` to include the monitored metric like shown
in the example above. Otherwise, if ``save_top_k >= 2`` and ``enable_version_counter=True`` (default), a
version is appended to the ``filename`` to prevent filename collisions. You should not rely on the appended
version to retrieve the top-k model, since there is no relationship between version count and model performance.
For example, ``filename-v2.ckpt`` doesn't necessarily correspond to the top-2 model.


- You can customize the checkpointing behavior to monitor any quantity of your training or validation steps. For example, if you want to update your checkpoints based on your validation loss:

|
Expand Down
4 changes: 3 additions & 1 deletion src/lightning/pytorch/callbacks/model_checkpoint.py
Original file line number Diff line number Diff line change
Expand Up @@ -94,7 +94,9 @@ class ModelCheckpoint(Checkpoint):
Please note that the monitors are checked every ``every_n_epochs`` epochs.
If ``save_top_k >= 2`` and the callback is called multiple times inside an epoch, and the filename remains
unchanged, the name of the saved file will be appended with a version count starting with ``v1`` to avoid
collisions unless ``enable_version_counter`` is set to False.
collisions unless ``enable_version_counter`` is set to False. The version counter is unrelated to the top-k
ranking of the checkpoint, and we recommend formatting the filename to include the monitored metric to avoid
collisions.
mode: one of {min, max}.
If ``save_top_k != 0``, the decision to overwrite the current save file is made
based on either the maximization or the minimization of the monitored quantity.
Expand Down
Loading