-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
wandb logger problem with on_step log on validation #4980
Comments
Hi! thanks for your contribution!, great first issue! |
This is not surprising. What exactly are you trying to log in each step of validation? |
Hi @awaelchli, thanks for the prompt answer! |
My suggestion in this case would be to gather your validation metrics in an array and log it as an histogram at the end. |
Thanks @borisdayma !! |
When training ImageNet, the validation epoch takes a long time. For this reason I would have liked to output a per-batch validation loss , as well as an aggregated per-epoch loss for early stopping and checkpointing. It wasn't obvious to me in the docs that this would be handled any differently than during training until I ran into the same issues as @andreaRapuzzi. However I understand why this is tricky and why different people would want it handled differently. It seems like It doesn't seem like it would require a lot of code to change (namely EvaluationLoop.__log_result_step_metrics()) but from a design perspective it may not be so simple to come up with the best way to handle the different use cases. |
@collinmccarthy An alternative (a bit hacky) would be to consider how often you log during training. |
This issue has been automatically marked as stale because it hasn't had any recent activity. This issue will be closed in 7 days if no further activity occurs. Thank you for your contributions, Pytorch Lightning Team! |
This issue has been automatically marked as stale because it hasn't had any recent activity. This issue will be closed in 7 days if no further activity occurs. Thank you for your contributions, Pytorch Lightning Team! |
This was fixed with #5931. |
🐛 Bug
When logging on the
validation_step
withon_step=True
andon_epoch=False
the following happens:on_step=False
andon_epoch=True
onvalidation_step
, the second withon_step=True
andon_epoch=False
(red in the image below). As you can see the training chart is affected by this:Please reproduce using the colab link at the top of this article
To Reproduce
Just change the
validation_step
logging like this:The text was updated successfully, but these errors were encountered: