Add a centered variance option to the ClippedAdam optimizer #3415

BenZickel · 2025-01-21T21:06:08Z

Problem

When using the ClippedAdam optimizer with highly imbalanced parameter gradients stability, the convergence rate of parameters with stable gradients is slower than what it could be.

Solution

Add an option to use the centered variance in the denominator of the step size calculation. Parameters with stable gradients will have a lower centered variance, than the current uncentered variance, and therefore will have a larger step size and higher convergence rate.

Testing

The improvement in convergence rate is shown below (taken from the test function run with plotting enabled):

The first plot shows the number of iterations needed in order to reach convergence, where convergence is defined as the ultimate loss plus a small threshold.
The second plot shows the convergence rate which is the mean per iteration improvement of the gap between the loss and the ultimate loss, which is roughly proportional to the inverse of the number of iterations needed in order to reach convergence. One can also notice that the convergence rate is less sensitive to changes in the learning rate when using the centered variance, compared to the uncentered variance case.
The third plot shows the ultimate loss reached, which shows that for regular Adam with uncentered variance the best possible loss, which is zero in our case, is not attained for small learning rates, within the allotted number of iterations.

pyro/optim/clipped_adam.py

martinjankowiak · 2025-01-24T02:29:11Z

@BenZickel can you please explain your figure? i don't know how a convergence rate is computed, and i can't tell if the differences in the second plot are significant given the scale

martinjankowiak · 2025-01-24T02:33:45Z

a bit of googling led me here. the same algo in essence? a 2 second scan suggests they do bias correction

https://edoc.hu-berlin.de/server/api/core/bitstreams/14960a8d-4c35-4d08-86d7-1e130ecd42c8/content

BenZickel · 2025-01-24T09:27:55Z

Thanks for the review @martinjankowiak.

I've added an additional figure and explanations about the plots. I think the first plot is the most important, as it shows the number of iterations needed in order to reach convergence.
Regarding the reference you provided, the algorithm described is indeed the same so I've added it to the references section. They describe the algorithm as using the true variance instead of the second moment of the gradient ("One has to note that Adam does not actually calculate the variance but the second moment instead. If E[gt] = 0, both definitions are identical, and fewer operations are required to calculate the second moment."). As for bias correction, they reach the conclusion that it does not need to change ("Finally, from the bias-corrected expression for vt, we conclude that the bias correction for cAdam is the same as that of Adam.").

… the Latent Dirichlet Allocation example.

BenZickel · 2025-01-24T12:43:56Z

I've added the option to use the centered variance option in the Latent Dirichlet Allocation (LDA) example. When running some tests I've noticed that the centered variance option improve both the convergence rate, and ultimate loss, for a wide range of learning rates. Additionally, the same phenomena, as seen in the above example, of reduced sensitivity of the convergence rate to changes in the learning rate, can be observed when using the centered variance option.

The centered variance option in the LDA example can be used by running

python pyro-ppl/examples/lda.py -cv True

martinjankowiak · 2025-01-24T16:38:19Z

pyro/optim/clipped_adam.py


    Small modification to the Adam algorithm implemented in torch.optim.Adam
-    to include gradient clipping and learning rate decay.
+    to include gradient clipping and learning rate decay and an option to use
+    the centered variance.


can you point to the ref here?

martinjankowiak · 2025-01-24T16:39:20Z

tests/optim/test_optim.py

@@ -435,3 +435,105 @@ def step(svi, optimizer):
        actual.append(step(svi, optimizer))

    assert_equal(actual, expected)
+
+
+def test_centered_clipped_adam(plot_results=False):


how long does this test take?

martinjankowiak · 2025-01-24T16:39:55Z

tests/optim/test_optim.py

+            loss_vec.append(loss)
+        return torch.Tensor(loss_vec)
+
+    def calc_convergence(loss_vec, tail_len=100, threshold=0.01):


comment what is being computed?

martinjankowiak · 2025-01-24T16:40:00Z

tests/optim/test_optim.py

+        convergence_rate = (convergence_vec[:-1] / convergence_vec[1:]).log().mean()
+        return ultimate_loss, convergence_rate, convergence_iter
+
+    def get_convergence_vec(lr_vec, centered_variance):


comment what is being computed?

martinjankowiak · 2025-01-24T16:45:00Z

thanks @BenZickel ! the motivation makes sense, and i can imagine how this might help, though i'm perhaps somewhat surprised by the size of the effect, though i guess your w has quite a range in magnitude, perhaps a larger range than we might expect to see in most scenarios

BenZickel added 2 commits January 21, 2025 09:29

Add option to use centered variance in the ClippedAdam optimizer.

87a24f6

Add test for the centered ClippedAdam optimizer.

3d13bda

martinjankowiak reviewed Jan 21, 2025

View reviewed changes

pyro/optim/clipped_adam.py Show resolved Hide resolved

BenZickel requested a review from martinjankowiak January 23, 2025 06:03

BenZickel added 2 commits January 24, 2025 10:34

Calculate convergence iteration for the centered ClippedAdam optimizer.

51e45a6

Added reference of the centered Adam optimizer.

9247336

Add option to use the ClippedAdam optimizer with centered variance in…

b51a14a

… the Latent Dirichlet Allocation example.

martinjankowiak reviewed Jan 24, 2025

View reviewed changes

martinjankowiak approved these changes Jan 24, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a centered variance option to the ClippedAdam optimizer #3415

Add a centered variance option to the ClippedAdam optimizer #3415

BenZickel commented Jan 21, 2025 •

edited

Loading

martinjankowiak commented Jan 24, 2025

martinjankowiak commented Jan 24, 2025

BenZickel commented Jan 24, 2025

BenZickel commented Jan 24, 2025

martinjankowiak Jan 24, 2025

martinjankowiak Jan 24, 2025

martinjankowiak Jan 24, 2025

martinjankowiak Jan 24, 2025

martinjankowiak commented Jan 24, 2025

Add a centered variance option to the ClippedAdam optimizer #3415

Are you sure you want to change the base?

Add a centered variance option to the ClippedAdam optimizer #3415

Conversation

BenZickel commented Jan 21, 2025 • edited Loading

Problem

Solution

Testing

martinjankowiak commented Jan 24, 2025

martinjankowiak commented Jan 24, 2025

BenZickel commented Jan 24, 2025

BenZickel commented Jan 24, 2025

martinjankowiak Jan 24, 2025

Choose a reason for hiding this comment

martinjankowiak Jan 24, 2025

Choose a reason for hiding this comment

martinjankowiak Jan 24, 2025

Choose a reason for hiding this comment

martinjankowiak Jan 24, 2025

Choose a reason for hiding this comment

martinjankowiak commented Jan 24, 2025

BenZickel commented Jan 21, 2025 •

edited

Loading