schedule_free: fix broadcasting of scalar arrays to 1d arrays #1042

n-gao · 2024-09-02T09:21:39Z

Currently, the momentum is stored in a 1D array [b1] of shape (1,). We should store it instead in a scalar array ()to avoid broadcasting scalars to (1,) in schedule_free_eval_params.

Before:

opt = optax.contrib.schedule_free_adamw()
x = jnp.ones(())
state = opt.init(x)
optax.contrib.schedule_free_eval_params(state, x).shape
# (1,)

After:

opt = optax.contrib.schedule_free_adamw()
x = jnp.ones(())
state = opt.init(x)
optax.contrib.schedule_free_eval_params(state, x).shape
# ()

fabianp · 2024-09-02T10:43:48Z

your solution seems reasonable to me. However, there are now some doctest errors in optax/contrib/_schedule_free.py, probably due to them now being included in the docs.

n-gao · 2024-09-02T10:56:46Z

@fabianp I am quite unfamiliar with the docs. I thought this would be a simple change. Is there some documentation on this? Otherwise, I can also remove the doc changes.

fabianp · 2024-09-02T11:02:36Z

you might not need to build the docs (although if you wanted to, its described in the README). Just check the errors from the failing CI (https://github.com/google-deepmind/optax/actions/runs/10664454490/job/29555745356?pr=1042). as you can see, the issue seems to be that some examples in the docstrings use schedule_free_eval_params instead of the full name optax.contrib.schedule...

let me know if this doesn't make sense

n-gao · 2024-09-02T11:07:59Z

I haven't touched any line related to schedule_free_eval_params. All the other lines that docs/api/contrib.rst also don't use complete paths? Let me check if the tests pass if I remove the lines again.

n-gao · 2024-09-02T12:05:29Z

I don't get why the tests are failing. it works locally and the change seems unrelated. @fabianp do you have another idea?

fabianp · 2024-09-02T14:36:09Z

you can undo the changes in docs/api/contrib.rst if you want since they are orthogonal to this PR

n-gao · 2024-09-02T14:37:28Z

done

fabianp · 2024-09-02T17:34:39Z

can you also add a test showing that the new approach doesn't have the broadcasting problem?

n-gao · 2024-09-02T20:05:58Z

I added a test that fails before and succeeds after the PR

fabianp · 2024-09-03T07:32:14Z

optax/contrib/_schedule_free_test.py

@@ -164,5 +164,16 @@ def run(opt):
    params_wrapper = run(opt_wrapper)
    chex.assert_trees_all_close(params_shortcut, params_wrapper)

+  @parameterized.parameters(*_OPTIMIZERS_UNDER_TEST)
+  def test_scalar_preservance(self, opt_name, opt_kwargs):
+    opt = getattr(alias, opt_name)(learning_rate=0.0, **opt_kwargs)


perhaps call this base_opt ? otherwise both the base and the wrapper have the same name which is confusing

fabianp · 2024-09-03T07:35:11Z

optax/contrib/_schedule_free_test.py

@@ -164,5 +164,16 @@ def run(opt):
    params_wrapper = run(opt_wrapper)
    chex.assert_trees_all_close(params_shortcut, params_wrapper)

+  @parameterized.parameters(*_OPTIMIZERS_UNDER_TEST)
+  def test_scalar_preservance(self, opt_name, opt_kwargs):


the name test_scalar_preservance is not very descriptive. Please either use a more precise name for the test or add a comment below explaining what behavior the function is testing

n-gao · 2024-09-03T08:15:01Z

Added a comment and changed the variable name. Though, this criticism probably applies to all the other tests in that file.

fabianp · 2024-09-03T08:37:07Z

excellent, thanks!

n-gao added 2 commits September 2, 2024 11:18

schedule_free: fix broadcasting of scalar arrays to 1d arrays

c92967a

add sgd and adam aliases to the docs

477c7dc

n-gao added 2 commits September 2, 2024 13:08

undo doc changes

26ed094

add references back in

0c433a8

undo doc changes again

b17d7f5

add test

36ff192

fabianp reviewed Sep 3, 2024

View reviewed changes

added comment

061a157

fabianp approved these changes Sep 3, 2024

View reviewed changes

copybara-service bot merged commit 896cb88 into google-deepmind:main Sep 4, 2024
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

schedule_free: fix broadcasting of scalar arrays to 1d arrays #1042

schedule_free: fix broadcasting of scalar arrays to 1d arrays #1042

n-gao commented Sep 2, 2024

fabianp commented Sep 2, 2024

n-gao commented Sep 2, 2024

fabianp commented Sep 2, 2024

n-gao commented Sep 2, 2024

n-gao commented Sep 2, 2024

fabianp commented Sep 2, 2024

n-gao commented Sep 2, 2024

fabianp commented Sep 2, 2024

n-gao commented Sep 2, 2024

fabianp Sep 3, 2024

fabianp Sep 3, 2024

n-gao commented Sep 3, 2024

fabianp commented Sep 3, 2024

schedule_free: fix broadcasting of scalar arrays to 1d arrays #1042

schedule_free: fix broadcasting of scalar arrays to 1d arrays #1042

Conversation

n-gao commented Sep 2, 2024

fabianp commented Sep 2, 2024

n-gao commented Sep 2, 2024

fabianp commented Sep 2, 2024

n-gao commented Sep 2, 2024

n-gao commented Sep 2, 2024

fabianp commented Sep 2, 2024

n-gao commented Sep 2, 2024

fabianp commented Sep 2, 2024

n-gao commented Sep 2, 2024

fabianp Sep 3, 2024

Choose a reason for hiding this comment

fabianp Sep 3, 2024

Choose a reason for hiding this comment

n-gao commented Sep 3, 2024

fabianp commented Sep 3, 2024