Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow changing the logged step value in validation_step #4130

Merged
merged 4 commits into from
Oct 21, 2020
Merged

Allow changing the logged step value in validation_step #4130

merged 4 commits into from
Oct 21, 2020

Conversation

mauvilsa
Copy link
Contributor

@mauvilsa mauvilsa commented Oct 13, 2020

What does this PR do?

This fixes the bug identified in the discussion of #4102

Fixes #4102
Fixes #4203

Before submitting

  • Was this discussed/approved via a Github issue? (no need for typos and docs improvements)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure your PR does only one thing, instead of bundling different changes together? Otherwise, we ask you to create a separate PR for every change.
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?
  • Did you verify new and existing tests pass locally with your changes?
  • If you made a notable change (that affects users), did you update the CHANGELOG?

PR review

@rohitgr7 please review.

@mergify mergify bot requested a review from a team October 13, 2020 20:54
@mauvilsa
Copy link
Contributor Author

mauvilsa commented Oct 13, 2020

I see that unit tests are now failing. There are asserts that check what has been logged and it now differs because epoch is now automatically logged. But I understood that having epoch is a feature. What should be done here? Change the tests?

@rohitgr7
Copy link
Contributor

But I understood that having epoch is a feature. What should be done here? Change the tests?

not sure about this. let's see what other reviewers say.

@rohitgr7 rohitgr7 requested review from awaelchli and Borda October 13, 2020 21:37
@williamFalcon
Copy link
Contributor

i don't think this is a bug... again, you shouldn't really log validation each step... it means nothing.

@mauvilsa
Copy link
Contributor Author

@williamFalcon let me explain the change. This is not about logging validation each step. The hard coding of the step argument to log_metrics in logger_connector.py#L202 is unnecessary since if omitted it would get the same value anyway because of logger_connector.py#L90. However, this hard coding prevents the user to override the value of step, making inconsistent the behavior between training and validation.

The only issue is the automatic logging of epoch in logger_connector.py#L89 which makes some unit tests fail. Maybe this automatic logging should be off by default and only enabled if requested by the user?

Comment on lines 87 to +90
elif step is None:
# added metrics by Lightning for convenience
scalar_metrics['epoch'] = self.trainer.current_epoch
step = step if step is not None else self.trainer.global_step
step = self.trainer.global_step
Copy link
Contributor

@rohitgr7 rohitgr7 Oct 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe #4132 tried to solve the problem of logging epoch on the x-axis in case of *_epoch_end, not sure why it was closed. IMO this should be improved so that by default it should set global_step for step_logs and current_epoch for epoch_logs and in case if step is provided in the metrics by the user, it should be replaced with that value. WDYT? Let's see what other has to say here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. Even better if for epoch_logs the default is the current epoch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From feedback it is clear that the default should be global_step.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!
let's update the test accordingly, shall we? :)

Copy link
Contributor

@awaelchli awaelchli Oct 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please note that the log_metrics method here is recursive.
At the bottom, it is calling itself with step (before PR changes) but now step is None. This means in the recursive call, step will be None and then it will add the epoch key. That is why you see the epoch key in the test as extra. Just wanted to point that out in case this was not clear.

@mergify mergify bot requested a review from a team October 14, 2020 10:26
@awaelchli awaelchli changed the title Fix to bug identified in issue #4102 [wip] allow changing the logged step value in validation_step [skip ci] Oct 16, 2020
@awaelchli awaelchli added the feature Is an improvement or enhancement label Oct 16, 2020
@awaelchli
Copy link
Contributor

I think the automatic epoch logging should stay. It is very useful to have that by default, because it tells us how many steps correspond to one epoch, and in advanced loggers it allows us to change the axis of plots to epoch. So my suggestion is to adjust the tests accordingly.

@awaelchli awaelchli added the bug Something isn't working label Oct 17, 2020
Copy link
Contributor

@awaelchli awaelchli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

works! the epoch is back <3

@mergify mergify bot requested a review from a team October 17, 2020 15:10
@codecov
Copy link

codecov bot commented Oct 17, 2020

Codecov Report

Merging #4130 into master will increase coverage by 3%.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master   #4130    +/-   ##
=======================================
+ Coverage      90%     93%    +3%     
=======================================
  Files         103     103            
  Lines        7795    7848    +53     
=======================================
+ Hits         6990    7281   +291     
+ Misses        805     567   -238     

@rohitgr7 rohitgr7 changed the title [wip] allow changing the logged step value in validation_step [skip ci] Allow changing the logged step value in validation_step [skip ci] Oct 17, 2020
@Borda Borda added this to the 1.0.x milestone Oct 20, 2020
@edenlightning edenlightning modified the milestones: 1.0.3, 1.1 Oct 20, 2020
@mergify
Copy link
Contributor

mergify bot commented Oct 20, 2020

This pull request is now in conflict... :(

@mergify mergify bot requested a review from a team October 21, 2020 09:23
@Borda Borda requested a review from rohitgr7 October 21, 2020 13:49
@Borda Borda changed the title Allow changing the logged step value in validation_step [skip ci] Allow changing the logged step value in validation_step Oct 21, 2020
@mergify
Copy link
Contributor

mergify bot commented Oct 21, 2020

This pull request is now in conflict... :(

Copy link
Contributor

@rohitgr7 rohitgr7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mauvilsa thanks for the fix. Also next time pls make a branch(not master) and push? would be easier to test locally :)

@rohitgr7 rohitgr7 merged commit 546476c into Lightning-AI:master Oct 21, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working feature Is an improvement or enhancement
Projects
None yet
7 participants