Two-points adaptation for CMA-ES #88

beniz · 2014-12-19T09:52:56Z

See:

'How to Assess Step-Size Adaptation Mechanisms in Randomised Search': https://hal.inria.fr/hal-00997294v3/document
CMA-ES with Two-Point Step-Size Adaptation: http://arxiv.org/abs/0805.0231

beniz · 2015-01-09T11:22:52Z

FTR, could be useful for #97

…ep-size adaptation...', ref #88

beniz · 2015-01-14T15:30:35Z

TPA from paper https://hal.inria.fr/hal-00997294v3/document is now available on branch 'tpa_88'. It is compatible with both active CMA and gradient injection. It has been implemented within the ask/eval/tell framework.

Preliminary results (under an early 'no-bug' assumption) are very good, especially convergence, and I will report more thoroughly shortly.

nikohansen · 2015-01-20T11:46:47Z

FTR, as it belongs here,

d_sigma = 4 - 3.6 / sqrt(dim)

seems a valid choice for TPA, which (a) reflects the experiments in the original paper up to 200-D about as well as sqrt(dim) and (b) is a more sensible choice with popsize=dim or with gradient injections, because it reflects the possible convergence rate in this case.

beniz · 2015-01-21T10:43:12Z

New dsigma value now in code.

I see several choices for importing TPA into production code (these are OR bullet points):

make TPA default over CMA
make TPA default for sep-CMA and VD-CMA with and/or without gradient injection (in order to fix sep-(a)CMA-ES does escape the solution basin on fsphere in high dimension with gradient injection #97 in the later case)
leave TPA as an option for now

And btw, there's an option to specify dsigma for TPA by hand.

Runs on BBOB will allow to assess overall performances.

nikohansen · 2015-01-21T11:45:39Z

My current favorite would be to make it default for sep-CMA, for VD-CMA, and when gradient injection is applied.

EDIT: and when (full) elitism is used, which is not implemented in the lib yet.

beniz · 2015-01-21T14:19:03Z

I do witness a problem with TPA + active CMA-ES (without and with gradient). It does not converge anymore to the optima on the standard test functions such as Rosenbrock and rotated Elli in all dimensions (i.e. starting with 2-D).

I've traced the source of the problem to the customized (opposite) points 'injected' in place of two (originally sampled) candidates. The problem goes away with a significant increase of lambda (>> +2).

To reproduce:

./test_functions -fname rosenbrock -dim 10 -ftarget 1e-8 -alg acmaes -no_stagnation -tpa

which yields something similar to:

beniz · 2015-01-21T14:39:12Z

More testing reveals that TPA is affecting the C_mu^- update of active CMA (using (1)), which appears to be the culprit here: canceling the update in C_mu^- removes the oddity. Will continue to investigate.

(1) "Benchmarking a Weighted Negative Covariance Matrix Update on the BBOB-2010 Noiseless Testbed"

nikohansen · 2015-01-21T15:10:15Z

This is in contradiction to my results (it works fine for me). We see that the variance in the covariance matrix goes quickly downwards, while the step-size does not, while it should rather be the other way around. My hunch is that the length normalization of the TPA samples is not correct (it's quite tricky to accomplish correctly), the steps are too long and therefore ranked low and therefore produce a strong negative update.

beniz · 2015-01-21T15:23:42Z

Very useful thank you! Will report here, as usual.

nikohansen · 2015-01-21T15:31:27Z

FTR, I use

y = mean - mean_old
y *= sum(self.randn(len(y))**2)**0.5 / self.mahalanobis_norm(y)
injected_x = mean + y

with

def mahalanobis_norm(self, dx):
    """compute the Mahalanobis norm that is induced by the adapted
    sample distribution, covariance matrix ``C`` times ``sigma**2``.
    The expected Mahalanobis distance to
    the sample mean is about ``sqrt(dimension)``.
    """
    return sqrt(sum((self.D**-1. * np.dot(self.B.T, dx))**2)) / self.sigma

i.e., the new Mahalanobis norm is approximately sqrt(dimension), as for any other sample.

My bad, as this has not been explicitly documented anywhere so far (it is probably the only way to make it work though).

beniz · 2015-01-22T13:04:30Z

Thanks, got it. I've implemented it both for full and diagonal cov matrix. I've tried several approximations as well, but sqrt(dim) does not fare well with sep-CMA, nor does ~dim. Therefore for now this is using the full mahalonobis computation.

In order to uncover any bug and be sure to set TPA as default where it fares the best, I've run a slightly more thorough examination (which can be extended to higher dimensions).

Below are f-evaluations/successes for various algorithms (the number of successful runs over 50 runs is reported in brackets) with/without gradient and with/without TPA (f-target is 1e-8, stagnation is deactivated).

In summary, TPA dominates in low-dim for fsphere with gradient. From more limited runs, I know that results are different in higher dimensions (e.g. 5000-D on rosenbrock), but these experiments take much more time to run.

(The misalignment is due to some strange behavior of markdown... trying to clean it up...)

active-CMA on fsphere:

D   fevals_avg      fevals_avg_tpa      fevals_avg_gi       fevals_avg_gi_tpa
5   728.16 (50)     1056.8 (50)     583.52 (50)     646.88(50)
10  1532.6 (50)     2179.6 (50)     1055.6 (50)     865.8(50)
20  2996.16 (50)    2764.56 (50)    1745.52 (50)    788.4(50)
40  5797.5 (50)     3478.5 (50)     3032.1 (50)     1064.4(50)

active-CMA on rosenbrock:

D   fevals_avg      fevals_avg_tpa      fevals_avg_gi       fevals_avg_gi_tpa
5   2177.74 (46)        2902.3 (47)     2130.73 (41)        2459.91(45)
10  5859.38 (48)        13378.8 (33)    5533.1 (42)        7595.78(45)
20  20477 (43)          65544.6 (21)    15736.5 (45)       27155.7(43)
40  76284.4 (40)        137940 (39)     49167.8 (48)        87623.8(46)

active-CMA on elli:

D   fevals_avg      fevals_avg_tpa      fevals_avg_gi       fevals_avg_gi_tpa
5   1761.12 (50)        2644.8 (50)     1486.56 (50)        1878.4(50)
10  4461.4 (50)      7380.4 (50)        3546.4 (50)     4709.4(50)
20  13180.6 (50)        22375.4 (50)        9866.88 (50)        12596.4(50)
40  44848.5 (50)        72888.9 (50)        32805 (50)      37630.8(50)

sep-active-CMA on fsphere

D   fevals_avg      fevals_avg_tpa      fevals_avg_gi       fevals_avg_gi_tpa
5   657.44 (50)            864.16 (50)      517.28 (50)     608.96(50)
10  1384.6 (50)     1933 (50)               913.6 (50)      1166(50)
20  2687.28 (50)    3786 (50)               1516.56 (50)        1703.04(50)
40  5327.7 (50)               6632.4 (50)       2669.1 (50)     2015.7(50)
100 12495.7 (50)    10894.6 (50)        5687.86 (50)        1354.56(50)

sep-active-CMA on rosenbrock

D   fevals_avg      fevals_avg_tpa      fevals_avg_gi       fevals_avg_gi_tpa
5   inf (0)                  inf (0)         inf (0)           inf(0)
10  inf (0)                  inf (0)         inf (0)           inf(0)
20  inf (0)                  inf (0)        78946.9 (43)            inf(0)
40  210740 (45)          inf (0)        181238 (38)         inf(0)

sep-active-CMA on elli

D   fevals_avg      fevals_avg_tpa      fevals_avg_gi       fevals_avg_gi_tpa
5   1298.24 (50)        1938.4 (50)     1136.8 (50)     1491.84(50)
10  2941.8 (50)     4698.4 (50)     2336.2 (50)     3179.4(50)
20  6595.92 (50)        11927.5 (50)        4612.8 (50)     7191.6(50)
40  15636.3 (50)        30640.8 (50)        10174.2 (50)        17035.5(50)

nikohansen · 2015-01-22T13:38:27Z

The results still don't look correct. My TPA implementation takes about the same number of evaluations on the 20 and 40-D ellipsoid as CSA (around 14000 / 50000 vs 14000 / 45000), ditto on Rosenbrock (where however results are more sensitive to the initial conditions).

It might be useful to cross-check the result on the Rosenbrock function. With x0=0.1, sigma0=0.1 I see only around 60000 evaluations in 40D on successful runs.

nikohansen · 2015-01-22T13:57:22Z

Just in case this was not so clear: The Mahalanobis length of both TPA-samples is determined by a single drawing of

sum(np.random.randn(dimension)**2)**0.5

In larger dimension it shouldn't make a big difference whether it is only a single instantiation.

beniz · 2015-01-22T14:06:49Z

Nailed it, thanks!

active-CMA on elli

D   fevals_avg      fevals_avg_tpa      fevals_avg_gi       fevals_avg_gi_tpa
5   1726.88 (50)        1659.84 (50)        1470.24 (50)        1369.92(50)
10  4472.8 (50)     4432.8 (50)     3580.6 (50)     3165.4(50)
20  13200.7 (50)        14059.4 (50)        9869.76 (50)        8974.32(50)
40  45009 (50)      51042.6 (50)        32723.4 (50)        32014.8(50)

active-CMA on rosenbrock

beniz@haumea:~/research/siminole/dev/libcmaes/tests$ ./tpa_tests -dims 5,10,20,40 -runs 50 -alg acmaes -fname rosenbrock
D   fevals_avg      fevals_avg_tpa      fevals_avg_gi       fevals_avg_gi_tpa
5   2128.68 (47)        1995.57 (47)        2012.38 (42)        2048.59(41)
10  5779.59 (49)        5818.48 (46)        6116.5 (40)     5718.95(38)
20  18750.6 (47)        19136.8 (45)        19519.8 (37)        14428.2(48)
40  67591 (45)      69697.8 (44)        58658.4 (41)        61445(42)

beniz · 2015-01-22T14:19:32Z

In larger dimension it shouldn't make a big difference whether it is only a single instantiation.

Yes, understood (also, I am already using a single drawing for both samples). This is one among a handful of computational improvements on my list then.

FYI, another important one is to keep rank into Candidate object, which requires a structural change that goes beyond just this ticket I believe. Typically, it would allow for faster check of the rank of the two samples of interest here, and it relates to the use of the lib with f-rankings only.

nikohansen · 2015-01-24T11:23:56Z

Do you have the complete results after the bug fix? In my experiments TPA tends to be a little worse than CSA, e.g. also with sep-CMA, unless it is only about realizing fast convergence speed. If you observe the same, maybe it should only become default under gradient injection.

beniz · 2015-01-24T13:09:31Z

Here is the complement, only up to 40-D. To push further requires a bit of time, let me know if you need 100-D and beyond. Note that gradient here uses the functional form, so there's no impact on the number of fevals with and without tpa. This is using ftarget=1e-8.
Runs are reproducible with, for example:

./tests/tpa_tests -dims 5,10,20,40 -runs 50 -alg acmaes -fname fsphere

Functions fsphere, elli and rosenbrock are supported.

active-CMA on fsphere:

D   fevals_avg      fevals_avg_tpa      fevals_avg_gi       fevals_avg_gi_tpa
5   730.4 (50)      668.48 (50)     586.4 (50)      445.12(50)
10  1525 (50)       1205.6 (50)     1048.4 (50)     644.4(50)
20  2988.48 (50)        1964.88 (50)        1731.84 (50)        815.76(50)
40  5779.8 (50)     3412.8 (50)     3034.8 (50)     1036.5(50)
100     13147.8 (10)            7403.5 (10)             6237.3 (10)             1293.7(10)

sep-active-CMA on fsphere:

D   fevals_avg      fevals_avg_tpa      fevals_avg_gi       fevals_avg_gi_tpa
5   666.24 (50)     601.92 (50)     520.64 (50)     403.68(50)
10  1356.6 (50)     1079.6 (50)     926.8 (50)      598.4(50)
20  2709.12 (50)        1859.52 (50)        1525.68 (50)        771.84(50)
40  5300.1 (50)     3325.2 (50)     2668.8 (50)     1006.8(50)
100 12480.4 (50)        7555.14 (50)        5687.86 (50)        1229.78(50)

sep-active-CMA on elli:

D   fevals_avg      fevals_avg_tpa      fevals_avg_gi       fevals_avg_gi_tpa
5   1307.84 (50)        1283.84 (50)        1127.68 (50)        1111.36(50)
10  3008.6 (50)     2810 (50)       2304.6 (50)     2299(50)
20  6627.36 (50)        6550.32 (50)        4686.48 (50)        5567.76(50)
40  15414.9 (50)        16490.1 (50)        10166.1 (50)        23457(50)
100 44242.5 (50)        52542.6 (50)        26145 (50)      inf(0)

sep-active-CMA on rosenbrock:

D   fevals_avg      fevals_avg_tpa      fevals_avg_gi       fevals_avg_gi_tpa
5   inf (0)     inf (0)     inf (0)     inf(0)
10  inf (0)     inf (0)     inf (0)     inf(0)
20  inf (0)     inf (0)     69486.6 (47)        inf(0)
40  194963 (48)     inf (0)     135850 (47)     inf(0)

EDIT: added 100-D elli for sep-active-CMA and I can reproduce the oddity for TPA + gradient (tolX).

nikohansen · 2015-01-26T16:29:09Z

I just tried to replicate the suspicious entry with sep-active-CMA on felli with my Python implementation. I get roughly (from at least two runs each)

D    fevals_avg_gi fevals_avg_gi_tpa
40       9500        12000
100     27000        60000
200     58000        inf

Two observations: our implementations differ (this might be a small difference though, not necessarily a bug), and there is a relevant problem with TPA, which you observe already in smaller dimension than I do.

Figure: Run on felli, sep-CMA with TPA and gradient injection in 200-D. If the code would allow arbitrary numerical precision, it would probably not fail, but the large condition number is undesirable in any case.

The failure seems only to occur with both, gradient and TPA being active.

On Rosenbrock, result inf is rather misleading, because I can make it work simply by removing the termination criteria tolupsigma and maxiter.

I consider the failure to converge to the optimum on the axis-parallel ellipsoid function a major flaw, not as bad but comparable to the failure on the sphere function.

Conclusion: TPA should not become default in production code, before this problem is more deeply investigated.

…nalysis, ref #88

beniz · 2015-01-27T06:59:53Z

TPA should not become default in production code, before this problem is more deeply investigated.

Defaults are now deactivated on 'dev' and 'tpa_88' branches. Checking on the difference between implementations is next.

beniz · 2015-01-27T07:29:43Z

Two observations: our implementations differ (this might be a small difference though, not necessarily a bug)

I have found where the difference comes from: I was using

cmu = min(1 - c1, 2 * (mueff - 2 + 1/mueff) / ( (N+2)^2 + mueff)

instead of (value in your last Python implementation):

cmu = min(1 - c1, (0.3 + mueff - 2 + 1/mueff) / (N + 4 * N^0.5 + mueff/2)

So now elli in 100-D with gradient and TPA converges fine, while the problem is observed on some runs in 200-D. Maybe this can help shed light on the deeper flaw.

nikohansen · 2015-01-27T16:39:59Z

Excellent, that's another bug-fix then which should improve the performance of sep-CMA-ES in larger dimension remarkably.

The main problem remains to be investigated, I am looking into it.

…and cleaner surrogate ranking code, ref #88, #57

beniz · 2015-02-23T07:43:16Z

TPA is working fine AFAIK, so I'm closing this until new details or research has to be merged in.

…ep-size adaptation...', ref CMA-ES#88

…S#88

…A-ES#88

…nalysis, ref CMA-ES#88

…and cleaner surrogate ranking code, ref CMA-ES#88, CMA-ES#57

beniz added optimization enhancement labels Dec 19, 2014

beniz self-assigned this Jan 9, 2015

beniz pushed a commit that referenced this issue Jan 14, 2015

two-point adaptation mechanisms from the 2014 paper 'How to assess st…

0931c38

…ep-size adaptation...', ref #88

beniz mentioned this issue Jan 14, 2015

sep-(a)CMA-ES does escape the solution basin on fsphere in high dimension with gradient injection #97

Closed

beniz pushed a commit that referenced this issue Jan 14, 2015

added tpa support to python bindings, ref #88

19b5e67

beniz pushed a commit that referenced this issue Jan 16, 2015

fixed dsigma init in tpa, ref #88

6ca0a65

beniz pushed a commit that referenced this issue Jan 19, 2015

added control of tpa dsigma, ref #88

7935036

beniz pushed a commit that referenced this issue Jan 21, 2015

added tpa to vdcma, ref #88

3bf0123

beniz pushed a commit that referenced this issue Jan 21, 2015

new default value for dsigma with TPA, ref #88

9f00e29

beniz pushed a commit that referenced this issue Jan 21, 2015

added TPA support to BBOB exe, ref #88

29fb1a4

beniz pushed a commit that referenced this issue Jan 21, 2015

tiny TPA cleanup, ref #88

65ec4af

beniz pushed a commit that referenced this issue Jan 22, 2015

improved normalization scheme for TPA samples, ref #88

f4a94f4

beniz pushed a commit that referenced this issue Jan 22, 2015

TPA testing / performance assessment exe, ref #88

64e1ed7

beniz pushed a commit that referenced this issue Jan 22, 2015

fix bug in normalization of TPA samples, ref #88

b306dad

beniz reopened this Jan 24, 2015

beniz pushed a commit that referenced this issue Jan 27, 2015

deactivating TPA as default with gradient and sep/vd CMA until more a…

b68d852

…nalysis, ref #88

beniz pushed a commit that referenced this issue Jan 27, 2015

new cmu and cc values for sep algorithms, ref #88

5913f43

beniz pushed a commit that referenced this issue Jan 27, 2015

deactivating tolx stopping criteria in TPA test exe, ref #88

9c7dab3

beniz pushed a commit that referenced this issue Feb 2, 2015

added rank to candidate, allowing for faster TPA candidate retrieval …

aa25e4c

…and cleaner surrogate ranking code, ref #88, #57

beniz pushed a commit that referenced this issue Feb 6, 2015

fixed auto init of TPA, ref #88, #103

b22b3d1

beniz pushed a commit that referenced this issue Feb 6, 2015

fixed auto init of TPA, ref #88, #103

f57aaae

beniz closed this as completed Feb 23, 2015

andrewsali pushed a commit to andrewsali/libcmaes that referenced this issue Jan 31, 2016

two-point adaptation mechanisms from the 2014 paper 'How to assess st…

9ad45cd

…ep-size adaptation...', ref CMA-ES#88

andrewsali pushed a commit to andrewsali/libcmaes that referenced this issue Jan 31, 2016

fixed dsigma init in tpa, ref CMA-ES#88

ae05ce8

andrewsali pushed a commit to andrewsali/libcmaes that referenced this issue Jan 31, 2016

added control of tpa dsigma, ref CMA-ES#88

eeed5e4

andrewsali pushed a commit to andrewsali/libcmaes that referenced this issue Jan 31, 2016

added tpa to vdcma, ref CMA-ES#88

74cb22d

andrewsali pushed a commit to andrewsali/libcmaes that referenced this issue Jan 31, 2016

new default value for dsigma with TPA, ref CMA-ES#88

92908c3

andrewsali pushed a commit to andrewsali/libcmaes that referenced this issue Jan 31, 2016

tiny TPA cleanup, ref CMA-ES#88

95502ab

andrewsali pushed a commit to andrewsali/libcmaes that referenced this issue Jan 31, 2016

improved normalization scheme for TPA samples, ref CMA-ES#88

4d98ca6

andrewsali pushed a commit to andrewsali/libcmaes that referenced this issue Jan 31, 2016

fix bug in normalization of TPA samples, ref CMA-ES#88

3e354a2

andrewsali pushed a commit to andrewsali/libcmaes that referenced this issue Jan 31, 2016

TPA now default if gradient, vdcma or sepcma are activated, ref CMA-E…

3ef754f

…S#88

andrewsali pushed a commit to andrewsali/libcmaes that referenced this issue Jan 31, 2016

refactored TPA flag and default settings for simplified usage, ref CM…

3763458

…A-ES#88

andrewsali pushed a commit to andrewsali/libcmaes that referenced this issue Jan 31, 2016

deactivating TPA as default with gradient and sep/vd CMA until more a…

2a5ce9b

…nalysis, ref CMA-ES#88

andrewsali pushed a commit to andrewsali/libcmaes that referenced this issue Jan 31, 2016

new cmu and cc values for sep algorithms, ref CMA-ES#88

d9076e7

andrewsali pushed a commit to andrewsali/libcmaes that referenced this issue Jan 31, 2016

added rank to candidate, allowing for faster TPA candidate retrieval …

7d7b85a

…and cleaner surrogate ranking code, ref CMA-ES#88, CMA-ES#57

andrewsali pushed a commit to andrewsali/libcmaes that referenced this issue Jan 31, 2016

fixed auto init of TPA, ref CMA-ES#88, CMA-ES#103

15a654a

andrewsali pushed a commit to andrewsali/libcmaes that referenced this issue Jan 31, 2016

fixed auto init of TPA, ref CMA-ES#88, CMA-ES#103

72ed434

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Two-points adaptation for CMA-ES #88

Two-points adaptation for CMA-ES #88

beniz commented Dec 19, 2014

beniz commented Jan 9, 2015

beniz commented Jan 14, 2015

nikohansen commented Jan 20, 2015

beniz commented Jan 21, 2015

nikohansen commented Jan 21, 2015

beniz commented Jan 21, 2015

beniz commented Jan 21, 2015

nikohansen commented Jan 21, 2015

beniz commented Jan 21, 2015

nikohansen commented Jan 21, 2015

beniz commented Jan 22, 2015

nikohansen commented Jan 22, 2015

nikohansen commented Jan 22, 2015

beniz commented Jan 22, 2015

beniz commented Jan 22, 2015

nikohansen commented Jan 24, 2015

beniz commented Jan 24, 2015

nikohansen commented Jan 26, 2015

beniz commented Jan 27, 2015

beniz commented Jan 27, 2015

nikohansen commented Jan 27, 2015

beniz commented Feb 23, 2015

Two-points adaptation for CMA-ES #88

Two-points adaptation for CMA-ES #88

Comments

beniz commented Dec 19, 2014

beniz commented Jan 9, 2015

beniz commented Jan 14, 2015

nikohansen commented Jan 20, 2015

beniz commented Jan 21, 2015

nikohansen commented Jan 21, 2015

beniz commented Jan 21, 2015

beniz commented Jan 21, 2015

nikohansen commented Jan 21, 2015

beniz commented Jan 21, 2015

nikohansen commented Jan 21, 2015

beniz commented Jan 22, 2015

nikohansen commented Jan 22, 2015

nikohansen commented Jan 22, 2015

beniz commented Jan 22, 2015

beniz commented Jan 22, 2015

nikohansen commented Jan 24, 2015

beniz commented Jan 24, 2015

nikohansen commented Jan 26, 2015

beniz commented Jan 27, 2015

beniz commented Jan 27, 2015

nikohansen commented Jan 27, 2015

beniz commented Feb 23, 2015