Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inaccurate results when using LARS optimizer #6523

Closed
amitz25 opened this issue Mar 15, 2021 · 2 comments
Closed

Inaccurate results when using LARS optimizer #6523

amitz25 opened this issue Mar 15, 2021 · 2 comments
Labels
3rd party Related to a 3rd-party bug Something isn't working help wanted Open to be worked on

Comments

@amitz25
Copy link

amitz25 commented Mar 15, 2021

Hi, when converting my repo to lightning format I failed to reproduce the training losses I previously used to get (and the model failed to converge). After some debugging, it seems there is a conflict between lightning and torchlars optimizer (https://github.com/kakaobrain/torchlars).
The problem happens only when using torch lightning's newer versions and the torchlars optimizer.

After a bit of black-boxing here are some ways I found to solve this for now (and reproduce the exact losses I used to get before moving my code to lightning):

  1. Downgrading lightning to 1.0.3
  2. Using another optimizer
  3. Returning None in configure_optimizers() and stepping the optimizer myself. e.g.:
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

Thanks

@amitz25 amitz25 added bug Something isn't working help wanted Open to be worked on labels Mar 15, 2021
@amitz25
Copy link
Author

amitz25 commented Mar 16, 2021

After some debugging, it turns out the problem is that torchlars doesn't call the closure before performing its operations.
I'm guessing the inspection in this fix doesn't work in this case:
#4981 (comment)

If anyone else encounters this issue, the easy fix is to add a hook for optimizer_step and call the closure by yourself (and don't pass it to the optimizer):

def optimizer_step(
self,
current_epoch,
batch_nb,
optimizer,
optimizer_i,
second_order_closure,
on_tpu,
using_native_amp,
using_lbfgs,
):
second_order_closure()
optimizer.step()
optimizer.zero_grad()

@edenlightning edenlightning added the 3rd party Related to a 3rd-party label Mar 19, 2021
@ananyahjha93
Copy link
Contributor

@amitz25 please reach out to the Kakao brain team to get the torchlars optimizer updated according to the latest torch optimizer requirements! Any latest torch Optimizer which follows the preset guidelines should work with Lightning.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3rd party Related to a 3rd-party bug Something isn't working help wanted Open to be worked on
Projects
None yet
Development

No branches or pull requests

3 participants