Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rethinking x/optym's interface #109

Open
brandondube opened this issue Dec 27, 2023 · 2 comments
Open

Rethinking x/optym's interface #109

brandondube opened this issue Dec 27, 2023 · 2 comments

Comments

@brandondube
Copy link
Owner

The "core" of x/optym is the convention def thing(fg: callable), where fg returns (cost, grad) based on the parameter vector x.

This is in a way restrictive, since gradient-less optimizers will just do f, _ = fg(x), and the computation of g will have been wasteful. There are also some circumstances where a linesearcher or similar may want only the gradient; in these scenarios the computation of f will have been wasteful. Of course, when using backprop, f is free along the way to computing g, but sometimes the gradient is known or compute-able without f (for example the rosenbrock function).

It is a greater burden on the user, but it may be superior to change fg to something like optimizeable, which is of the sense

type Optimizable interface {
    f(vector) float
    g(vector) vector
    h(vector) array

[optional]
    fg(vector) (float, vector)
    fgh(vector) (float, vector, array)
}

Then each optimizer can just check if not hasattr(o, 'g'): raise ValueError('<myoptimizer> requires the gradient'). In principle we could fall back to finite differences, but I think that just leads to unhappy or misunderstanding users who do finite differences for problems with ~a dozen dimensions, then view it as impossible for something like a million dimensions when it would have been perfectly doable with backprop. Forcing the user to opt in with a forward_differences(f, x0, eps=1e-9) and central_differences(f, x0, eps=1e-9) set of functions could help abate this

i.e., one might do

from scipy.optimize import rosen, rosen_der

class Rosenbrock:
    def f(self, x):
        return rosen(x)
    def g(self, x):
        # return rosen_der(x)
        return forward_differences(self.f, x)

I think this would be preferable to enable something like Nelder-Meade for functions that for example do not strictly have a gradient. In principle we could also look for h_j_prod(vector) vector but I sincerely hope I never implement optimizers that want the hessian jacobian product

Thoughts @Jashcraf ?

@Jashcraf
Copy link
Contributor

Personally I don't think it's a big deal to throw a ValueError for an optimizer that requires a gradient.

Something I don't really understand - why would you want the gradient for an optimizer that doesn't require one (e.g. Nelder-Mead)?

@brandondube
Copy link
Owner Author

Something I don't really understand - why would you want the gradient for an optimizer that doesn't require one (e.g. Nelder-Mead)?

The intent is actually to modify the interface so that the gradient is optional in the most general sense, but a gradient-based optimizer would error if it's not available.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants