-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add DFP quasi newton method #125
base: dev
Are you sure you want to change the base?
feat: add DFP quasi newton method #125
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Great to get this going.
The update formulas look good to me, and I think this is a nice way to extend the class to other flavours of quasi Newton updates. To make this process even easier in the future, I suggest we settle on a signature for the update formulas, perhaps encoded through an abstract base class. They should all take the same inputs and return the same outputs, so that users can swap in theirs without having to make changes to the wrapper function. I left some notes about this :)
I also agree on making this a separate PR - we can deal with the termination conditions later, and this keeps it focused. Thanks!
Lastly, I'm uncertain about changing the name, since AbstractBFGS
is part of the public API. Willing to be convinced here!
optimistix/_solver/quasi_newton.py
Outdated
) | ||
|
||
|
||
def _update_hessian(update_formula, f_eval, grad, prev_grad, y_diff, hessian, hessian_inv) -> _Hessian: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we define bfgs_update
and dfp_update
such that they have the same signature (take the same arguments)? They don't have to use all the same arguments, but they should accept them.
Then we don't need this wrapper function here, which will allow us to more cleanly introduce other flavours of quasi-Newton updates in the future, such as for SLSQP.
I'd like this to be something like
class AbstractBFGS(...):
...
update: AbstractQuasiNewtonUpdate
...
def step(...):
...
def accepted(descent_state):
...
f_eval_info = self.update(...)
...
where self.update
takes the same parameters, regardless of the formula used. This could be achieved with an abstract base class, such as
class AbstractQuasiNewtonUpdate(eqx.Module, strict=True):
use_inverse: bool
@abc.abstractmethod
def _update(self, ...):
"""Update formula...."""
def __call__(self, ...):
return _update(...)
This would require some ironing out of the input arguments that allow us to support all the flavours we want, without boxing ourselves in. This probably means switching to passing versions of FunctionInfo
, which carry most of the attributes, rather than passing everything individually. For example, we could pass f_info: Union[FunctionInfo.EvalGradHessian, FunctionInfo.EvalGradHessianInv]
, and f_eval_info: FunctionInfo.EvalGrad
, plus y, state.y_eval
and that should have us covered for the cases I can think of right now.
It would also define a clean interface others can use to implement their own favourite version. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a great idea, I'll add this! 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done! but...
Unfortunately, I couldn't find a clean way to provide use_inverse
as an argument to the update formulas as this would e.g. allow passing use_inverse=True
to the solver, but use_inverse=False
to the update formula which would be a broken state.
I'm also not 100% sure it the function signatures are as you envision them, e.g. I couldn't use f_eval_info
(because that's the output of this hessian update).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if we have use_inverse
in the update, then we should no longer have it in the solver. It only makes sense to have it in one place, I think.
Right now we only use it to determine how to compute the update, the descents decide what to do based on the function information they are provided. So I think that makes it a good candidate to live in the update class! Or do you find it more ergonomic to make it an argument that the update takes instead?
Regarding the flavour of function information to pass here - I would go with FunctionInfo.EvalGrad
, look at how we create it on the fly for the search:
optimistix/optimistix/_solver/bfgs.py
Line 224 in 482f9a2
FunctionInfo.Eval(f_eval), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now we only use it to determine how to compute the update, the descents decide what to do based on the function information they are provided. So I think that makes it a good candidate to live in the update class! Or do you find it more ergonomic to make it an argument that the update takes instead?
I can do that, but it would break the interface of BFGS
once more. I think this change makes sense though! Are you ok with that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the flavour of function information to pass here - I would go with FunctionInfo.EvalGrad, look at how we create it on the fly for the search:
Ok, I'll have a look 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we're already making a break here anyway, so if you agree that it makes sense that let's move that into the update class.
optimistix/_solver/quasi_newton.py
Outdated
@@ -235,8 +271,8 @@ def accepted(descent_state): | |||
else: | |||
hessian = state.f_info.hessian | |||
hessian_inv = None | |||
f_eval_info = _bfgs_update( | |||
f_eval, grad, state.f_info.grad, hessian, hessian_inv, y_diff | |||
f_eval_info = _update_hessian( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above - I think it would be a little neater to call a self.update
that takes a defined list of arguments and can be swapped without requiring changes to the wrapper function.
optimistix/_solver/quasi_newton.py
Outdated
if hessian is None: | ||
# Inverse Hessian | ||
assert hessian_inv is not None | ||
# DFP update to the operator directly |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me! Just inner
should be moved into this function proper if we switch to a unified signature.
optimistix/_solver/__init__.py
Outdated
from .quasi_newton import ( | ||
AbstractQuasiNewton as AbstractQuasiNewton, | ||
BFGS as BFGS, | ||
bfgs_update as bfgs_update, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small thing: I'm not sure if the update formulas themselves need to be public given that they already define a solver, but no strong feelings on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't 100% sure about this at first either, but I think they have to in order to construct custom solvers. Users need to be able to pass these functions to custom solvers for the new argument (hessian_update
) - that's why I exposed them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense!
@@ -32,9 +32,9 @@ In addition to the following, note that the [Optax](https://github.com/deepmind/ | |||
|
|||
--- | |||
|
|||
??? abstract "`optimistix.AbstractBFGS`" | |||
??? abstract "`optimistix.AbstractQuasiNewton`" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little undecided on renaming this. AbstractBFGS
is part of the public API, and users might have defined their own custom solvers atop of it, which makes me hesitant to change the name. On the other hand, it is mathematically and conceptually cleaner, and we do have at least one other quasi-Newton update formula planned (SLSQP). I'd appreciate your perspective on this, @patrick-kidger!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to add that I'd definitely would vote for updating the name of the internal state. It would be quite confusing to use optx.DFP
and work with a BFGS
state.
For consistency (and clean interface) I think it would be nice to rename the abstract base class aswell. Maybe we could instead keep AbstractBFGS
in addition for backwards compatibility (and drop it at some point in a future release)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see your point! And we could either use a type alias or an import alias for backward compatibility. Ideally decorated with a deprecation warning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually we can't do this because we're changing attributes. I think a clear description for how to upgrade is the best way to go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@johannahaffner and I just had a quick about this one. I think I agree -- let's (a) rename to AbstractQuasiNewton
, and (b) not have any AbstractQuasiNweton.use_inverse
attribute (since it's no longer needed at the solver level).
I don't have strong feelings on whether we also alias AbstractBFGS = AbstractQuasiNewton
.
Hi @johannahaffner, I've updated the code following your suggestions. This implementation was the best I could come up with. Could you give it another look? Thanks 🙏 |
optimistix/_solver/quasi_newton.py
Outdated
def no_update(self, use_inverse, inner, grad_diff, y_diff, f_info): | ||
if use_inverse: | ||
return f_info.hessian_inv | ||
return f_info.hessian |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: can we use an else here for readability?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! I think this is going to be a very nice feature. Left some comments here and there :)
Do we know anything about the properties of DFP vs BFGS? Specifically if one or the other is more suitable for a certain class of problems. It would be nice to add a sentence or two to their respective documentations, such that users get an idea which one might be a better choice for the problem they are trying to solve.
Likewise, I think we could explicate in AbstractQuasiNewton
that this class is compatible with different flavours of quasi-Newton updates, and that these may be chosen among implementations of AbstractQuasiNewtonUpdate
.
return f_info.hessian | ||
|
||
@abc.abstractmethod | ||
def update( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we move all of the logic into the update method? (And then call filter_cond
on inner functions.) We can also do the conversion to lineax operators in there, I think.
The reason for this is that computing y_diff
, grad_diff
as well as the inner product don't necessarily generalise to all quasi-Newton updates, e.g. to those that work with limited memory (L-BFGS), or the ones that are modified to account for constraint information (SLSQP). These should still work with the appropriate flavour of function information.
I should sit down with pen and paper to confirm that last point (pretty sure though!), I will do that later tonight.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So long story short: keep __call__
as generic as possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can make __call__
an abstract method and move all logic of update
into the concrete __call__
implementations directly. That's as generic as it can be I think. Is this what you have in mind?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That works as well, I think. The main thing would be to not put restrictions on future implementations of this update class.
hessian_inv = None | ||
f_eval_info = _bfgs_update( | ||
f_eval, grad, state.f_info.grad, hessian, hessian_inv, y_diff | ||
f_eval_info = self.hessian_update( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like how streamlined this now looks!
I think I addressed all of your comments now @johannahaffner :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
I did go over the requirements we would have for L-BFGS and SLSQP, and it seems that we're good when it comes to the information we need to pass to the update class! Everything can be communicated through the appropriate flavours of function information. (If I run into issues related to dual variables for this, then I'll figure those out myself.)
Lastly, I think it would be nice to add the new AbstractQuasiNewtonUpdate
to the documentation as well, directly in minimise.md
. We also list different methods to compute the NonlinearCG
update there, so I think it would fit nicely. (We could do with some indents though.)
|
||
This is a quasi-Newton optimisation algorithm, whose defining feature is the way | ||
it progressively builds up a Hessian approximation using multiple steps of gradient | ||
information. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we add something like:
[optimistix.BFGS][] is generally preferred, since it is more numerically stable on most problems.
I have not convinced myself of this in the sense that I derived the formula or anything, but it does seem to be the general consensus in the literature. (Interestingly, BFGS as an alternative to DFP seems to have been developed independently, rather than jointly, by Broyden, Fletcher, Goldfarb and Shanno - and they all published their work in 1970.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I'll add this to the docs 👍
(Interestingly, BFGS as an alternative to DFP seems to have been developed independently, rather than jointly, by Broyden, Fletcher, Goldfarb and Shanno - and they all published their work in 1970.)
Interesting, I didn't know about this!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just realised this - sorry for not spotting it earlier!
self.descent = NewtonDescent(linear_solver=lx.Cholesky()) | ||
# TODO(raderj): switch out `BacktrackingArmijo` with a better line search. | ||
self.search = BacktrackingArmijo() | ||
self.hessian_update = DFPUpdate(use_inverse=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went over the tests again and just spotted this - can we revert to the old behaviour regarding use_inverse
here? That is
def __init__(self, ..., use_inverse: bool = True):
...
self.hessian_update = BFGSUpdate(use_inverse=use_inverse)
Users should definitely be able to choose among these without having to subclass!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once we're reverting here, the solver definitions in helpers.py
can be simplified again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, I can do that, but there are 2 arguments in favor of how it's now I'd say:
- Subclassing becomes a bit weird as it's possible to provide
use_inverse
to the solver and to the hessian update. That is quite confusing I'd say and one may even pass two different booleans to them, which results in a broken state of the solver. - The inverse is only used in the update, so there's no logical need for it to exist on the solver. Hm, ok, but the same argument goes for
rtol, atol, norm, ...
, so this is not a strong argument.
One idea could be that use_inverse
is an argument to the AbstractQuationNewtonUpdate.__call__
, and not to the class instance itself?
Do you have any ideas on how to re-solve (1.)? I can revert it then :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see that I should have been more clear! I don't want to revert to making it a solver attribute, only to it being an argument of __init__
for the concrete classes (BFGS and DFP), which is passed on to the update when it is initialised.
This would not affect the behaviour of the abstract class, which is what people should subclass.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good! I'll do that :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realise I've been pretty silent on this PR so far. It looks like things are shaping up pretty nicely -- let me know if(/when) you think this is done, and I'll go over it!
@@ -32,9 +32,9 @@ In addition to the following, note that the [Optax](https://github.com/deepmind/ | |||
|
|||
--- | |||
|
|||
??? abstract "`optimistix.AbstractBFGS`" | |||
??? abstract "`optimistix.AbstractQuasiNewton`" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@johannahaffner and I just had a quick about this one. I think I agree -- let's (a) rename to AbstractQuasiNewton
, and (b) not have any AbstractQuasiNweton.use_inverse
attribute (since it's no longer needed at the solver level).
I don't have strong feelings on whether we also alias AbstractBFGS = AbstractQuasiNewton
.
Can we merge this into a dev branch? I would then add L-BFGS on top of this before we do the next release. Cheers! |
Hi @johannahaffner and @patrick-kidger, |
Hi @johannahaffner & @patrick-kidger , Also, I updated this PR with the recent changes in main. From my side, this is ready for review. |
I've just updated the target branch to |
Awesome, thanks! |
Hi @johannahaffner,
as discussed earlier I added the
update_formula
argument in order to switch betweenDFP
andBFGS
.As both are of class "quasi-Newton methods" I went ahead and changed the abstract base class name, and renamed the source file accordingly.
I wasn't sure how granular you'd like to have the type signature of the
bfgs_update
anddfp_update
functions, I went with a simple blankCallable
for now.I also added
optx.DFP
to the test suite (but no custom variants as it's currently done for BFGS).I've also not yet added the
termination criteria
argument - I thought it make sense to separate these into 2 different PRs as it has not really anything to do with theDFP
algorithm.Let me know what you think!
Best, Peter
(closes #123)