lr scheduler currently maintains two global states to implement the full lr warmup and decay.
We want to remove these:
"nit: we can make these two arguments still as function arguments below, but we use:
functools.partial to pack these two arguments when adding to the optimizer.
global config works for now. This can be done as a follow up"