-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow changing the logged step value in validation_step #4130
Conversation
I see that unit tests are now failing. There are asserts that check what has been logged and it now differs because |
not sure about this. let's see what other reviewers say. |
i don't think this is a bug... again, you shouldn't really log validation each step... it means nothing. |
@williamFalcon let me explain the change. This is not about logging validation each step. The hard coding of the The only issue is the automatic logging of epoch in logger_connector.py#L89 which makes some unit tests fail. Maybe this automatic logging should be off by default and only enabled if requested by the user? |
elif step is None: | ||
# added metrics by Lightning for convenience | ||
scalar_metrics['epoch'] = self.trainer.current_epoch | ||
step = step if step is not None else self.trainer.global_step | ||
step = self.trainer.global_step |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe #4132 tried to solve the problem of logging epoch on the x-axis in case of *_epoch_end
, not sure why it was closed. IMO this should be improved so that by default it should set global_step
for step_logs and current_epoch
for epoch_logs and in case if step
is provided in the metrics by the user, it should be replaced with that value. WDYT? Let's see what other has to say here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. Even better if for epoch_logs the default is the current epoch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From feedback it is clear that the default should be global_step
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice!
let's update the test accordingly, shall we? :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please note that the log_metrics method here is recursive.
At the bottom, it is calling itself with step (before PR changes) but now step is None. This means in the recursive call, step will be None and then it will add the epoch key. That is why you see the epoch key in the test as extra. Just wanted to point that out in case this was not clear.
I think the automatic epoch logging should stay. It is very useful to have that by default, because it tells us how many steps correspond to one epoch, and in advanced loggers it allows us to change the axis of plots to epoch. So my suggestion is to adjust the tests accordingly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
works! the epoch is back <3
Codecov Report
@@ Coverage Diff @@
## master #4130 +/- ##
=======================================
+ Coverage 90% 93% +3%
=======================================
Files 103 103
Lines 7795 7848 +53
=======================================
+ Hits 6990 7281 +291
+ Misses 805 567 -238 |
This pull request is now in conflict... :( |
This pull request is now in conflict... :( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mauvilsa thanks for the fix. Also next time pls make a branch(not master) and push? would be easier to test locally :)
What does this PR do?
This fixes the bug identified in the discussion of #4102
Fixes #4102
Fixes #4203
Before submitting
PR review
@rohitgr7 please review.