Interaction with learning rate schedule #9

Permafacture · 2024-01-16T18:06:53Z

Has there been any research on how this strategy interacts with a learning rate schedule? Especially for something extreme like the one-cycle policy (super convergence). It seems like the history of the scale of the gradient would be dominated by changes in the learning rate. I found this paper that touches on the subject but doesn't propose any theory behind or solution to the interaction between the two.

from https://hal.science/hal-03891707v1/file/Learning_rate_scheduling_and_gradient_clipping_for_audio_source_separation.pdf

As expected, AutoClip doesn't interact well with cosine annealing

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Interaction with learning rate schedule #9

Interaction with learning rate schedule #9

Permafacture commented Jan 16, 2024

Interaction with learning rate schedule #9

Interaction with learning rate schedule #9

Comments

Permafacture commented Jan 16, 2024