feat: add DFP quasi newton method #125

pfackeldey · 2025-03-17T19:37:42Z

as discussed earlier I added the update_formula argument in order to switch between DFP and BFGS.
As both are of class "quasi-Newton methods" I went ahead and changed the abstract base class name, and renamed the source file accordingly.

I wasn't sure how granular you'd like to have the type signature of the bfgs_update and dfp_update functions, I went with a simple blank Callable for now.

I also added optx.DFP to the test suite (but no custom variants as it's currently done for BFGS).

I've also not yet added the termination criteria argument - I thought it make sense to separate these into 2 different PRs as it has not really anything to do with the DFP algorithm.

Let me know what you think!

Best, Peter

(closes #123)

johannahaffner

Nice! Great to get this going.

The update formulas look good to me, and I think this is a nice way to extend the class to other flavours of quasi Newton updates. To make this process even easier in the future, I suggest we settle on a signature for the update formulas, perhaps encoded through an abstract base class. They should all take the same inputs and return the same outputs, so that users can swap in theirs without having to make changes to the wrapper function. I left some notes about this :)

I also agree on making this a separate PR - we can deal with the termination conditions later, and this keeps it focused. Thanks!

Lastly, I'm uncertain about changing the name, since AbstractBFGS is part of the public API. Willing to be convinced here!

johannahaffner · 2025-03-17T20:11:30Z

optimistix/_solver/quasi_newton.py

+)
+
+
+def _update_hessian(update_formula, f_eval, grad, prev_grad, y_diff, hessian, hessian_inv) -> _Hessian:


Can we define bfgs_update and dfp_update such that they have the same signature (take the same arguments)? They don't have to use all the same arguments, but they should accept them.

Then we don't need this wrapper function here, which will allow us to more cleanly introduce other flavours of quasi-Newton updates in the future, such as for SLSQP.

I'd like this to be something like

class AbstractBFGS(...): ... update: AbstractQuasiNewtonUpdate ... def step(...): ... def accepted(descent_state): ... f_eval_info = self.update(...) ...

where self.update takes the same parameters, regardless of the formula used. This could be achieved with an abstract base class, such as

class AbstractQuasiNewtonUpdate(eqx.Module, strict=True): use_inverse: bool @abc.abstractmethod def _update(self, ...): """Update formula....""" def __call__(self, ...): return _update(...)

This would require some ironing out of the input arguments that allow us to support all the flavours we want, without boxing ourselves in. This probably means switching to passing versions of FunctionInfo, which carry most of the attributes, rather than passing everything individually. For example, we could pass f_info: Union[FunctionInfo.EvalGradHessian, FunctionInfo.EvalGradHessianInv], and f_eval_info: FunctionInfo.EvalGrad, plus y, state.y_eval and that should have us covered for the cases I can think of right now.
It would also define a clean interface others can use to implement their own favourite version. WDYT?

That's a great idea, I'll add this! 👍

Done! but...

Unfortunately, I couldn't find a clean way to provide use_inverse as an argument to the update formulas as this would e.g. allow passing use_inverse=True to the solver, but use_inverse=False to the update formula which would be a broken state.

I'm also not 100% sure it the function signatures are as you envision them, e.g. I couldn't use f_eval_info (because that's the output of this hessian update).

I think if we have use_inverse in the update, then we should no longer have it in the solver. It only makes sense to have it in one place, I think.
Right now we only use it to determine how to compute the update, the descents decide what to do based on the function information they are provided. So I think that makes it a good candidate to live in the update class! Or do you find it more ergonomic to make it an argument that the update takes instead?

Regarding the flavour of function information to pass here - I would go with FunctionInfo.EvalGrad, look at how we create it on the fly for the search:

optimistix/optimistix/_solver/bfgs.py

Line 224 in 482f9a2

FunctionInfo.Eval(f_eval),

Right now we only use it to determine how to compute the update, the descents decide what to do based on the function information they are provided. So I think that makes it a good candidate to live in the update class! Or do you find it more ergonomic to make it an argument that the update takes instead?

I can do that, but it would break the interface of BFGS once more. I think this change makes sense though! Are you ok with that?

Regarding the flavour of function information to pass here - I would go with FunctionInfo.EvalGrad, look at how we create it on the fly for the search:

Ok, I'll have a look 👍

I think we're already making a break here anyway, so if you agree that it makes sense that let's move that into the update class.

optimistix/_solver/quasi_newton.py

johannahaffner · 2025-03-17T20:20:25Z

optimistix/_solver/quasi_newton.py

@@ -235,8 +271,8 @@ def accepted(descent_state):
            else:
                hessian = state.f_info.hessian
                hessian_inv = None
-            f_eval_info = _bfgs_update(
-                f_eval, grad, state.f_info.grad, hessian, hessian_inv, y_diff
+            f_eval_info = _update_hessian(


See above - I think it would be a little neater to call a self.update that takes a defined list of arguments and can be swapped without requiring changes to the wrapper function.

johannahaffner · 2025-03-17T20:22:17Z

optimistix/_solver/quasi_newton.py

+    if hessian is None:
+        # Inverse Hessian
+        assert hessian_inv is not None
+        # DFP update to the operator directly


This looks good to me! Just inner should be moved into this function proper if we switch to a unified signature.

johannahaffner · 2025-03-17T20:23:17Z

optimistix/_solver/__init__.py

+from .quasi_newton import (
+    AbstractQuasiNewton as AbstractQuasiNewton,
+    BFGS as BFGS,
+    bfgs_update as bfgs_update,


Small thing: I'm not sure if the update formulas themselves need to be public given that they already define a solver, but no strong feelings on this.

I wasn't 100% sure about this at first either, but I think they have to in order to construct custom solvers. Users need to be able to pass these functions to custom solvers for the new argument (hessian_update) - that's why I exposed them.

Makes sense!

johannahaffner · 2025-03-17T20:25:14Z

docs/api/minimise.md

@@ -32,9 +32,9 @@ In addition to the following, note that the [Optax](https://github.com/deepmind/

 ---

-??? abstract "`optimistix.AbstractBFGS`"
+??? abstract "`optimistix.AbstractQuasiNewton`"


I'm a little undecided on renaming this. AbstractBFGS is part of the public API, and users might have defined their own custom solvers atop of it, which makes me hesitant to change the name. On the other hand, it is mathematically and conceptually cleaner, and we do have at least one other quasi-Newton update formula planned (SLSQP). I'd appreciate your perspective on this, @patrick-kidger!

I'd like to add that I'd definitely would vote for updating the name of the internal state. It would be quite confusing to use optx.DFP and work with a BFGS state.
For consistency (and clean interface) I think it would be nice to rename the abstract base class aswell. Maybe we could instead keep AbstractBFGS in addition for backwards compatibility (and drop it at some point in a future release)?

~~I see your point! And we could either use a type alias or an import alias for backward compatibility. Ideally decorated with a deprecation warning.~~

Actually we can't do this because we're changing attributes. I think a clear description for how to upgrade is the best way to go.

@johannahaffner and I just had a quick about this one. I think I agree -- let's (a) rename to AbstractQuasiNewton, and (b) not have any AbstractQuasiNweton.use_inverse attribute (since it's no longer needed at the solver level).

I don't have strong feelings on whether we also alias AbstractBFGS = AbstractQuasiNewton.

pfackeldey · 2025-03-18T19:16:05Z

Hi @johannahaffner,

I've updated the code following your suggestions. This implementation was the best I could come up with.

Could you give it another look? Thanks 🙏

johannahaffner · 2025-03-18T19:46:53Z

optimistix/_solver/quasi_newton.py

+    def no_update(self, use_inverse, inner, grad_diff, y_diff, f_info):
+        if use_inverse:
+            return f_info.hessian_inv
+        return f_info.hessian


Nit: can we use an else here for readability?

johannahaffner

Nice! I think this is going to be a very nice feature. Left some comments here and there :)

Do we know anything about the properties of DFP vs BFGS? Specifically if one or the other is more suitable for a certain class of problems. It would be nice to add a sentence or two to their respective documentations, such that users get an idea which one might be a better choice for the problem they are trying to solve.

Likewise, I think we could explicate in AbstractQuasiNewton that this class is compatible with different flavours of quasi-Newton updates, and that these may be chosen among implementations of AbstractQuasiNewtonUpdate.

johannahaffner · 2025-03-18T20:00:23Z

optimistix/_solver/quasi_newton.py

+        return f_info.hessian
+
+    @abc.abstractmethod
+    def update(


Can we move all of the logic into the update method? (And then call filter_cond on inner functions.) We can also do the conversion to lineax operators in there, I think.

The reason for this is that computing y_diff, grad_diff as well as the inner product don't necessarily generalise to all quasi-Newton updates, e.g. to those that work with limited memory (L-BFGS), or the ones that are modified to account for constraint information (SLSQP). These should still work with the appropriate flavour of function information.

I should sit down with pen and paper to confirm that last point (pretty sure though!), I will do that later tonight.

So long story short: keep __call__ as generic as possible.

I can make __call__ an abstract method and move all logic of update into the concrete __call__ implementations directly. That's as generic as it can be I think. Is this what you have in mind?

That works as well, I think. The main thing would be to not put restrictions on future implementations of this update class.

johannahaffner · 2025-03-18T20:02:29Z

optimistix/_solver/quasi_newton.py

-                hessian_inv = None
-            f_eval_info = _bfgs_update(
-                f_eval, grad, state.f_info.grad, hessian, hessian_inv, y_diff
+            f_eval_info = self.hessian_update(


I like how streamlined this now looks!

pfackeldey · 2025-03-18T21:10:04Z

I think I addressed all of your comments now @johannahaffner :)

johannahaffner

Thanks!

I did go over the requirements we would have for L-BFGS and SLSQP, and it seems that we're good when it comes to the information we need to pass to the update class! Everything can be communicated through the appropriate flavours of function information. (If I run into issues related to dual variables for this, then I'll figure those out myself.)

Lastly, I think it would be nice to add the new AbstractQuasiNewtonUpdate to the documentation as well, directly in minimise.md. We also list different methods to compute the NonlinearCG update there, so I think it would fit nicely. (We could do with some indents though.)

johannahaffner · 2025-03-18T22:10:39Z

optimistix/_solver/quasi_newton.py

+
+    This is a quasi-Newton optimisation algorithm, whose defining feature is the way
+    it progressively builds up a Hessian approximation using multiple steps of gradient
+    information.


Could we add something like:

[optimistix.BFGS][] is generally preferred, since it is more numerically stable on most problems.

I have not convinced myself of this in the sense that I derived the formula or anything, but it does seem to be the general consensus in the literature. (Interestingly, BFGS as an alternative to DFP seems to have been developed independently, rather than jointly, by Broyden, Fletcher, Goldfarb and Shanno - and they all published their work in 1970.)

Sure, I'll add this to the docs 👍

(Interestingly, BFGS as an alternative to DFP seems to have been developed independently, rather than jointly, by Broyden, Fletcher, Goldfarb and Shanno - and they all published their work in 1970.)

Interesting, I didn't know about this!

johannahaffner

Just realised this - sorry for not spotting it earlier!

johannahaffner · 2025-03-22T09:23:02Z

optimistix/_solver/quasi_newton.py

+        self.descent = NewtonDescent(linear_solver=lx.Cholesky())
+        # TODO(raderj): switch out `BacktrackingArmijo` with a better line search.
+        self.search = BacktrackingArmijo()
+        self.hessian_update = DFPUpdate(use_inverse=True)


I went over the tests again and just spotted this - can we revert to the old behaviour regarding use_inverse here? That is

def __init__(self, ..., use_inverse: bool = True): ... self.hessian_update = BFGSUpdate(use_inverse=use_inverse)

Users should definitely be able to choose among these without having to subclass!

Once we're reverting here, the solver definitions in helpers.py can be simplified again.

Hm, I can do that, but there are 2 arguments in favor of how it's now I'd say:

Subclassing becomes a bit weird as it's possible to provide use_inverse to the solver and to the hessian update. That is quite confusing I'd say and one may even pass two different booleans to them, which results in a broken state of the solver.

The inverse is only used in the update, so there's no logical need for it to exist on the solver. Hm, ok, but the same argument goes for rtol, atol, norm, ..., so this is not a strong argument.

One idea could be that use_inverse is an argument to the AbstractQuationNewtonUpdate.__call__, and not to the class instance itself?

Do you have any ideas on how to re-solve (1.)? I can revert it then :)

Ah, I see that I should have been more clear! I don't want to revert to making it a solver attribute, only to it being an argument of __init__ for the concrete classes (BFGS and DFP), which is passed on to the update when it is initialised.
This would not affect the behaviour of the abstract class, which is what people should subclass.

Sounds good! I'll do that :)

patrick-kidger

I realise I've been pretty silent on this PR so far. It looks like things are shaping up pretty nicely -- let me know if(/when) you think this is done, and I'll go over it!

patrick-kidger · 2025-03-29T17:06:31Z

docs/api/minimise.md

@@ -32,9 +32,9 @@ In addition to the following, note that the [Optax](https://github.com/deepmind/

 ---

-??? abstract "`optimistix.AbstractBFGS`"
+??? abstract "`optimistix.AbstractQuasiNewton`"


@johannahaffner and I just had a quick about this one. I think I agree -- let's (a) rename to AbstractQuasiNewton, and (b) not have any AbstractQuasiNweton.use_inverse attribute (since it's no longer needed at the solver level).

I don't have strong feelings on whether we also alias AbstractBFGS = AbstractQuasiNewton.

johannahaffner · 2025-03-31T16:07:34Z

Can we merge this into a dev branch? I would then add L-BFGS on top of this before we do the next release.

Cheers!

pfackeldey · 2025-03-31T16:36:54Z

Hi @johannahaffner and @patrick-kidger,
I’m currently on vacation. The PR is ready from my side though, so feel free to go over it :)

pfackeldey · 2025-04-07T14:24:01Z

Hi @johannahaffner & @patrick-kidger ,
I can't change the target branch to a dev branch (because I have no rights to create new branches I think).

Also, I updated this PR with the recent changes in main. From my side, this is ready for review.

patrick-kidger · 2025-04-07T14:49:08Z

I've just updated the target branch to dev!
Thank you very much for this. I'll aim to do a review of this when I get some time on Sunday :)

pfackeldey · 2025-04-07T15:44:52Z

I've just updated the target branch to dev! Thank you very much for this. I'll aim to do a review of this when I get some time on Sunday :)

Awesome, thanks!

pfackeldey added 2 commits March 17, 2025 15:27

feat: add DFP quasi newton method

22e0c81

remove unnecessary computation from dfp_update

6154768

johannahaffner reviewed Mar 17, 2025

View reviewed changes

pfackeldey added 3 commits March 18, 2025 13:24

update_formula -> hessian_update

6680c29

move quasi-newton update formulas into classes

e71b664

forgot to update custom_solver notebook

f67be63

johannahaffner reviewed Mar 18, 2025

View reviewed changes

move from quasi-Newton solver to hessian update

6b23c94

johannahaffner reviewed Mar 18, 2025

View reviewed changes

pfackeldey added 3 commits March 18, 2025 16:08

move from quasi-Newton solver to hessian update (forgot docs...)

fde73e7

use ; better abstraction of

150fcab

use 'unpacked' gradient directly

c4dc7fe

johannahaffner reviewed Mar 18, 2025

View reviewed changes

johannahaffner mentioned this pull request Mar 19, 2025

OptaxMinimiser does not work with Optax-Linesearch #121

Open

update docs

c9f2389

johannahaffner reviewed Mar 22, 2025

View reviewed changes

add back use_inverse to BFGS & DFP

11ad275

patrick-kidger reviewed Mar 29, 2025

View reviewed changes

johannahaffner mentioned this pull request Apr 5, 2025

Using optimistix with an equinox model #56

Open

Merge branch 'main' into pfackeldey/davidon_fletcher_powell_update

f1097c4

patrick-kidger changed the base branch from main to dev April 7, 2025 14:48

		)


		def _update_hessian(update_formula, f_eval, grad, prev_grad, y_diff, hessian, hessian_inv) -> _Hessian:

feat: add DFP quasi newton method #125

Are you sure you want to change the base?

feat: add DFP quasi newton method #125

Conversation

pfackeldey commented Mar 17, 2025

johannahaffner left a comment

Choose a reason for hiding this comment

johannahaffner Mar 17, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pfackeldey Mar 17, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

johannahaffner Mar 17, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pfackeldey commented Mar 18, 2025

Choose a reason for hiding this comment

johannahaffner left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pfackeldey commented Mar 18, 2025

johannahaffner left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

johannahaffner left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

johannahaffner Mar 22, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

patrick-kidger left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

johannahaffner commented Mar 31, 2025

pfackeldey commented Mar 31, 2025

pfackeldey commented Apr 7, 2025

patrick-kidger commented Apr 7, 2025 • edited Loading

pfackeldey commented Apr 7, 2025

johannahaffner Mar 17, 2025 •

edited

Loading

pfackeldey Mar 17, 2025 •

edited

Loading

johannahaffner Mar 17, 2025 •

edited

Loading

johannahaffner left a comment •

edited

Loading

johannahaffner Mar 22, 2025 •

edited

Loading

patrick-kidger commented Apr 7, 2025 •

edited

Loading