-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Two-points adaptation for CMA-ES #88
Comments
FTR, could be useful for #97 |
TPA from paper https://hal.inria.fr/hal-00997294v3/document is now available on branch 'tpa_88'. It is compatible with both active CMA and gradient injection. It has been implemented within the ask/eval/tell framework. Preliminary results (under an early 'no-bug' assumption) are very good, especially convergence, and I will report more thoroughly shortly. |
FTR, as it belongs here,
seems a valid choice for TPA, which (a) reflects the experiments in the original paper up to 200-D about as well as sqrt(dim) and (b) is a more sensible choice with popsize=dim or with gradient injections, because it reflects the possible convergence rate in this case. |
New dsigma value now in code. I see several choices for importing TPA into production code (these are OR bullet points):
And btw, there's an option to specify dsigma for TPA by hand. Runs on BBOB will allow to assess overall performances. |
My current favorite would be to make it default for sep-CMA, for VD-CMA, and when gradient injection is applied. EDIT: and when (full) elitism is used, which is not implemented in the lib yet. |
I do witness a problem with TPA + active CMA-ES (without and with gradient). It does not converge anymore to the optima on the standard test functions such as Rosenbrock and rotated Elli in all dimensions (i.e. starting with 2-D). I've traced the source of the problem to the customized (opposite) points 'injected' in place of two (originally sampled) candidates. The problem goes away with a significant increase of lambda (>> +2). To reproduce:
|
More testing reveals that TPA is affecting the C_mu^- update of active CMA (using (1)), which appears to be the culprit here: canceling the update in C_mu^- removes the oddity. Will continue to investigate. (1) "Benchmarking a Weighted Negative Covariance Matrix Update on the BBOB-2010 Noiseless Testbed" |
This is in contradiction to my results (it works fine for me). We see that the variance in the covariance matrix goes quickly downwards, while the step-size does not, while it should rather be the other way around. My hunch is that the length normalization of the TPA samples is not correct (it's quite tricky to accomplish correctly), the steps are too long and therefore ranked low and therefore produce a strong negative update. |
Very useful thank you! Will report here, as usual. |
FTR, I use
with
i.e., the new Mahalanobis norm is approximately My bad, as this has not been explicitly documented anywhere so far (it is probably the only way to make it work though). |
Thanks, got it. I've implemented it both for full and diagonal cov matrix. I've tried several approximations as well, but sqrt(dim) does not fare well with sep-CMA, nor does ~dim. Therefore for now this is using the full mahalonobis computation. In order to uncover any bug and be sure to set TPA as default where it fares the best, I've run a slightly more thorough examination (which can be extended to higher dimensions). Below are f-evaluations/successes for various algorithms (the number of successful runs over 50 runs is reported in brackets) with/without gradient and with/without TPA (f-target is 1e-8, stagnation is deactivated). In summary, TPA dominates in low-dim for fsphere with gradient. From more limited runs, I know that results are different in higher dimensions (e.g. 5000-D on rosenbrock), but these experiments take much more time to run. (The misalignment is due to some strange behavior of markdown... trying to clean it up...)
|
The results still don't look correct. My TPA implementation takes about the same number of evaluations on the 20 and 40-D ellipsoid as CSA (around 14000 / 50000 vs 14000 / 45000), ditto on Rosenbrock (where however results are more sensitive to the initial conditions). It might be useful to cross-check the result on the Rosenbrock function. With x0=0.1, sigma0=0.1 I see only around 60000 evaluations in 40D on successful runs. |
Just in case this was not so clear: The Mahalanobis length of both TPA-samples is determined by a single drawing of
In larger dimension it shouldn't make a big difference whether it is only a single instantiation. |
Nailed it, thanks!
|
Yes, understood (also, I am already using a single drawing for both samples). This is one among a handful of computational improvements on my list then. FYI, another important one is to keep rank into Candidate object, which requires a structural change that goes beyond just this ticket I believe. Typically, it would allow for faster check of the rank of the two samples of interest here, and it relates to the use of the lib with f-rankings only. |
Do you have the complete results after the bug fix? In my experiments TPA tends to be a little worse than CSA, e.g. also with sep-CMA, unless it is only about realizing fast convergence speed. If you observe the same, maybe it should only become default under gradient injection. |
Here is the complement, only up to 40-D. To push further requires a bit of time, let me know if you need 100-D and beyond. Note that gradient here uses the functional form, so there's no impact on the number of fevals with and without tpa. This is using ftarget=1e-8.
Functions fsphere, elli and rosenbrock are supported.
EDIT: added 100-D elli for sep-active-CMA and I can reproduce the oddity for TPA + gradient (tolX). |
Defaults are now deactivated on 'dev' and 'tpa_88' branches. Checking on the difference between implementations is next. |
I have found where the difference comes from: I was using
instead of (value in your last Python implementation):
So now elli in 100-D with gradient and TPA converges fine, while the problem is observed on some runs in 200-D. Maybe this can help shed light on the deeper flaw. |
Excellent, that's another bug-fix then which should improve the performance of sep-CMA-ES in larger dimension remarkably. The main problem remains to be investigated, I am looking into it. |
TPA is working fine AFAIK, so I'm closing this until new details or research has to be merged in. |
…ep-size adaptation...', ref CMA-ES#88
See:
The text was updated successfully, but these errors were encountered: