Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Two-points adaptation for CMA-ES #88

Closed
beniz opened this issue Dec 19, 2014 · 24 comments
Closed

Two-points adaptation for CMA-ES #88

beniz opened this issue Dec 19, 2014 · 24 comments

Comments

@beniz
Copy link
Collaborator

beniz commented Dec 19, 2014

See:

@beniz
Copy link
Collaborator Author

beniz commented Jan 9, 2015

FTR, could be useful for #97

beniz pushed a commit that referenced this issue Jan 14, 2015
@beniz
Copy link
Collaborator Author

beniz commented Jan 14, 2015

TPA from paper https://hal.inria.fr/hal-00997294v3/document is now available on branch 'tpa_88'. It is compatible with both active CMA and gradient injection. It has been implemented within the ask/eval/tell framework.

Preliminary results (under an early 'no-bug' assumption) are very good, especially convergence, and I will report more thoroughly shortly.

@nikohansen
Copy link
Collaborator

FTR, as it belongs here,

d_sigma = 4 - 3.6 / sqrt(dim) 

seems a valid choice for TPA, which (a) reflects the experiments in the original paper up to 200-D about as well as sqrt(dim) and (b) is a more sensible choice with popsize=dim or with gradient injections, because it reflects the possible convergence rate in this case.

beniz pushed a commit that referenced this issue Jan 21, 2015
@beniz
Copy link
Collaborator Author

beniz commented Jan 21, 2015

New dsigma value now in code.

I see several choices for importing TPA into production code (these are OR bullet points):

And btw, there's an option to specify dsigma for TPA by hand.

Runs on BBOB will allow to assess overall performances.

beniz pushed a commit that referenced this issue Jan 21, 2015
@nikohansen
Copy link
Collaborator

My current favorite would be to make it default for sep-CMA, for VD-CMA, and when gradient injection is applied.

EDIT: and when (full) elitism is used, which is not implemented in the lib yet.

beniz pushed a commit that referenced this issue Jan 21, 2015
@beniz
Copy link
Collaborator Author

beniz commented Jan 21, 2015

I do witness a problem with TPA + active CMA-ES (without and with gradient). It does not converge anymore to the optima on the standard test functions such as Rosenbrock and rotated Elli in all dimensions (i.e. starting with 2-D).

I've traced the source of the problem to the customized (opposite) points 'injected' in place of two (originally sampled) candidates. The problem goes away with a significant increase of lambda (>> +2).

To reproduce:

./test_functions -fname rosenbrock -dim 10 -ftarget 1e-8 -alg acmaes -no_stagnation -tpa

which yields something similar to:
acmaes_tpa_10d

@beniz
Copy link
Collaborator Author

beniz commented Jan 21, 2015

More testing reveals that TPA is affecting the C_mu^- update of active CMA (using (1)), which appears to be the culprit here: canceling the update in C_mu^- removes the oddity. Will continue to investigate.

(1) "Benchmarking a Weighted Negative Covariance Matrix Update on the BBOB-2010 Noiseless Testbed"

@nikohansen
Copy link
Collaborator

This is in contradiction to my results (it works fine for me). We see that the variance in the covariance matrix goes quickly downwards, while the step-size does not, while it should rather be the other way around. My hunch is that the length normalization of the TPA samples is not correct (it's quite tricky to accomplish correctly), the steps are too long and therefore ranked low and therefore produce a strong negative update.

@beniz
Copy link
Collaborator Author

beniz commented Jan 21, 2015

Very useful thank you! Will report here, as usual.

@nikohansen
Copy link
Collaborator

FTR, I use

y = mean - mean_old
y *= sum(self.randn(len(y))**2)**0.5 / self.mahalanobis_norm(y)
injected_x = mean + y

with

def mahalanobis_norm(self, dx):
    """compute the Mahalanobis norm that is induced by the adapted
    sample distribution, covariance matrix ``C`` times ``sigma**2``.
    The expected Mahalanobis distance to
    the sample mean is about ``sqrt(dimension)``.
    """
    return sqrt(sum((self.D**-1. * np.dot(self.B.T, dx))**2)) / self.sigma

i.e., the new Mahalanobis norm is approximately sqrt(dimension), as for any other sample.

My bad, as this has not been explicitly documented anywhere so far (it is probably the only way to make it work though).

@beniz
Copy link
Collaborator Author

beniz commented Jan 22, 2015

Thanks, got it. I've implemented it both for full and diagonal cov matrix. I've tried several approximations as well, but sqrt(dim) does not fare well with sep-CMA, nor does ~dim. Therefore for now this is using the full mahalonobis computation.

In order to uncover any bug and be sure to set TPA as default where it fares the best, I've run a slightly more thorough examination (which can be extended to higher dimensions).

Below are f-evaluations/successes for various algorithms (the number of successful runs over 50 runs is reported in brackets) with/without gradient and with/without TPA (f-target is 1e-8, stagnation is deactivated).

In summary, TPA dominates in low-dim for fsphere with gradient. From more limited runs, I know that results are different in higher dimensions (e.g. 5000-D on rosenbrock), but these experiments take much more time to run.

(The misalignment is due to some strange behavior of markdown... trying to clean it up...)

  • active-CMA on fsphere:
D   fevals_avg      fevals_avg_tpa      fevals_avg_gi       fevals_avg_gi_tpa
5   728.16 (50)     1056.8 (50)     583.52 (50)     646.88(50)
10  1532.6 (50)     2179.6 (50)     1055.6 (50)     865.8(50)
20  2996.16 (50)    2764.56 (50)    1745.52 (50)    788.4(50)
40  5797.5 (50)     3478.5 (50)     3032.1 (50)     1064.4(50)
  • active-CMA on rosenbrock:
D   fevals_avg      fevals_avg_tpa      fevals_avg_gi       fevals_avg_gi_tpa
5   2177.74 (46)        2902.3 (47)     2130.73 (41)        2459.91(45)
10  5859.38 (48)        13378.8 (33)    5533.1 (42)        7595.78(45)
20  20477 (43)          65544.6 (21)    15736.5 (45)       27155.7(43)
40  76284.4 (40)        137940 (39)     49167.8 (48)        87623.8(46)
  • active-CMA on elli:
D   fevals_avg      fevals_avg_tpa      fevals_avg_gi       fevals_avg_gi_tpa
5   1761.12 (50)        2644.8 (50)     1486.56 (50)        1878.4(50)
10  4461.4 (50)      7380.4 (50)        3546.4 (50)     4709.4(50)
20  13180.6 (50)        22375.4 (50)        9866.88 (50)        12596.4(50)
40  44848.5 (50)        72888.9 (50)        32805 (50)      37630.8(50)
  • sep-active-CMA on fsphere
D   fevals_avg      fevals_avg_tpa      fevals_avg_gi       fevals_avg_gi_tpa
5   657.44 (50)            864.16 (50)      517.28 (50)     608.96(50)
10  1384.6 (50)     1933 (50)               913.6 (50)      1166(50)
20  2687.28 (50)    3786 (50)               1516.56 (50)        1703.04(50)
40  5327.7 (50)               6632.4 (50)       2669.1 (50)     2015.7(50)
100 12495.7 (50)    10894.6 (50)        5687.86 (50)        1354.56(50)
  • sep-active-CMA on rosenbrock
D   fevals_avg      fevals_avg_tpa      fevals_avg_gi       fevals_avg_gi_tpa
5   inf (0)                  inf (0)         inf (0)           inf(0)
10  inf (0)                  inf (0)         inf (0)           inf(0)
20  inf (0)                  inf (0)        78946.9 (43)            inf(0)
40  210740 (45)          inf (0)        181238 (38)         inf(0)
  • sep-active-CMA on elli
D   fevals_avg      fevals_avg_tpa      fevals_avg_gi       fevals_avg_gi_tpa
5   1298.24 (50)        1938.4 (50)     1136.8 (50)     1491.84(50)
10  2941.8 (50)     4698.4 (50)     2336.2 (50)     3179.4(50)
20  6595.92 (50)        11927.5 (50)        4612.8 (50)     7191.6(50)
40  15636.3 (50)        30640.8 (50)        10174.2 (50)        17035.5(50)

@nikohansen
Copy link
Collaborator

The results still don't look correct. My TPA implementation takes about the same number of evaluations on the 20 and 40-D ellipsoid as CSA (around 14000 / 50000 vs 14000 / 45000), ditto on Rosenbrock (where however results are more sensitive to the initial conditions).

It might be useful to cross-check the result on the Rosenbrock function. With x0=0.1, sigma0=0.1 I see only around 60000 evaluations in 40D on successful runs.

@nikohansen
Copy link
Collaborator

Just in case this was not so clear: The Mahalanobis length of both TPA-samples is determined by a single drawing of

sum(np.random.randn(dimension)**2)**0.5

In larger dimension it shouldn't make a big difference whether it is only a single instantiation.

@beniz
Copy link
Collaborator Author

beniz commented Jan 22, 2015

Nailed it, thanks!

  • active-CMA on elli
D   fevals_avg      fevals_avg_tpa      fevals_avg_gi       fevals_avg_gi_tpa
5   1726.88 (50)        1659.84 (50)        1470.24 (50)        1369.92(50)
10  4472.8 (50)     4432.8 (50)     3580.6 (50)     3165.4(50)
20  13200.7 (50)        14059.4 (50)        9869.76 (50)        8974.32(50)
40  45009 (50)      51042.6 (50)        32723.4 (50)        32014.8(50)
  • active-CMA on rosenbrock
beniz@haumea:~/research/siminole/dev/libcmaes/tests$ ./tpa_tests -dims 5,10,20,40 -runs 50 -alg acmaes -fname rosenbrock
D   fevals_avg      fevals_avg_tpa      fevals_avg_gi       fevals_avg_gi_tpa
5   2128.68 (47)        1995.57 (47)        2012.38 (42)        2048.59(41)
10  5779.59 (49)        5818.48 (46)        6116.5 (40)     5718.95(38)
20  18750.6 (47)        19136.8 (45)        19519.8 (37)        14428.2(48)
40  67591 (45)      69697.8 (44)        58658.4 (41)        61445(42)

@beniz
Copy link
Collaborator Author

beniz commented Jan 22, 2015

In larger dimension it shouldn't make a big difference whether it is only a single instantiation.

Yes, understood (also, I am already using a single drawing for both samples). This is one among a handful of computational improvements on my list then.

FYI, another important one is to keep rank into Candidate object, which requires a structural change that goes beyond just this ticket I believe. Typically, it would allow for faster check of the rank of the two samples of interest here, and it relates to the use of the lib with f-rankings only.

@nikohansen
Copy link
Collaborator

Do you have the complete results after the bug fix? In my experiments TPA tends to be a little worse than CSA, e.g. also with sep-CMA, unless it is only about realizing fast convergence speed. If you observe the same, maybe it should only become default under gradient injection.

@beniz beniz reopened this Jan 24, 2015
@beniz
Copy link
Collaborator Author

beniz commented Jan 24, 2015

Here is the complement, only up to 40-D. To push further requires a bit of time, let me know if you need 100-D and beyond. Note that gradient here uses the functional form, so there's no impact on the number of fevals with and without tpa. This is using ftarget=1e-8.
Runs are reproducible with, for example:

./tests/tpa_tests -dims 5,10,20,40 -runs 50 -alg acmaes -fname fsphere

Functions fsphere, elli and rosenbrock are supported.

  • active-CMA on fsphere:
D   fevals_avg      fevals_avg_tpa      fevals_avg_gi       fevals_avg_gi_tpa
5   730.4 (50)      668.48 (50)     586.4 (50)      445.12(50)
10  1525 (50)       1205.6 (50)     1048.4 (50)     644.4(50)
20  2988.48 (50)        1964.88 (50)        1731.84 (50)        815.76(50)
40  5779.8 (50)     3412.8 (50)     3034.8 (50)     1036.5(50)
100     13147.8 (10)            7403.5 (10)             6237.3 (10)             1293.7(10)
  • sep-active-CMA on fsphere:
D   fevals_avg      fevals_avg_tpa      fevals_avg_gi       fevals_avg_gi_tpa
5   666.24 (50)     601.92 (50)     520.64 (50)     403.68(50)
10  1356.6 (50)     1079.6 (50)     926.8 (50)      598.4(50)
20  2709.12 (50)        1859.52 (50)        1525.68 (50)        771.84(50)
40  5300.1 (50)     3325.2 (50)     2668.8 (50)     1006.8(50)
100 12480.4 (50)        7555.14 (50)        5687.86 (50)        1229.78(50)
  • sep-active-CMA on elli:
D   fevals_avg      fevals_avg_tpa      fevals_avg_gi       fevals_avg_gi_tpa
5   1307.84 (50)        1283.84 (50)        1127.68 (50)        1111.36(50)
10  3008.6 (50)     2810 (50)       2304.6 (50)     2299(50)
20  6627.36 (50)        6550.32 (50)        4686.48 (50)        5567.76(50)
40  15414.9 (50)        16490.1 (50)        10166.1 (50)        23457(50)
100 44242.5 (50)        52542.6 (50)        26145 (50)      inf(0)
  • sep-active-CMA on rosenbrock:
D   fevals_avg      fevals_avg_tpa      fevals_avg_gi       fevals_avg_gi_tpa
5   inf (0)     inf (0)     inf (0)     inf(0)
10  inf (0)     inf (0)     inf (0)     inf(0)
20  inf (0)     inf (0)     69486.6 (47)        inf(0)
40  194963 (48)     inf (0)     135850 (47)     inf(0)

EDIT: added 100-D elli for sep-active-CMA and I can reproduce the oddity for TPA + gradient (tolX).

@nikohansen
Copy link
Collaborator

I just tried to replicate the suspicious entry with sep-active-CMA on felli with my Python implementation. I get roughly (from at least two runs each)

D    fevals_avg_gi fevals_avg_gi_tpa
40       9500        12000
100     27000        60000
200     58000        inf 

Two observations: our implementations differ (this might be a small difference though, not necessarily a bug), and there is a relevant problem with TPA, which you observe already in smaller dimension than I do.

screen shot 2015-01-26 at 16 38 15
Figure: Run on felli, sep-CMA with TPA and gradient injection in 200-D. If the code would allow arbitrary numerical precision, it would probably not fail, but the large condition number is undesirable in any case.

The failure seems only to occur with both, gradient and TPA being active.

On Rosenbrock, result inf is rather misleading, because I can make it work simply by removing the termination criteria tolupsigma and maxiter.

I consider the failure to converge to the optimum on the axis-parallel ellipsoid function a major flaw, not as bad but comparable to the failure on the sphere function.

Conclusion: TPA should not become default in production code, before this problem is more deeply investigated.

@beniz
Copy link
Collaborator Author

beniz commented Jan 27, 2015

TPA should not become default in production code, before this problem is more deeply investigated.

Defaults are now deactivated on 'dev' and 'tpa_88' branches. Checking on the difference between implementations is next.

@beniz
Copy link
Collaborator Author

beniz commented Jan 27, 2015

Two observations: our implementations differ (this might be a small difference though, not necessarily a bug)

I have found where the difference comes from: I was using

cmu = min(1 - c1, 2 * (mueff - 2 + 1/mueff) / ( (N+2)^2 + mueff)

instead of (value in your last Python implementation):

cmu = min(1 - c1, (0.3 + mueff - 2 + 1/mueff) / (N + 4 * N^0.5 + mueff/2)

So now elli in 100-D with gradient and TPA converges fine, while the problem is observed on some runs in 200-D. Maybe this can help shed light on the deeper flaw.

@nikohansen
Copy link
Collaborator

Excellent, that's another bug-fix then which should improve the performance of sep-CMA-ES in larger dimension remarkably.

The main problem remains to be investigated, I am looking into it.

beniz pushed a commit that referenced this issue Feb 2, 2015
beniz pushed a commit that referenced this issue Feb 6, 2015
beniz pushed a commit that referenced this issue Feb 6, 2015
@beniz
Copy link
Collaborator Author

beniz commented Feb 23, 2015

TPA is working fine AFAIK, so I'm closing this until new details or research has to be merged in.

@beniz beniz closed this as completed Feb 23, 2015
andrewsali pushed a commit to andrewsali/libcmaes that referenced this issue Jan 31, 2016
andrewsali pushed a commit to andrewsali/libcmaes that referenced this issue Jan 31, 2016
andrewsali pushed a commit to andrewsali/libcmaes that referenced this issue Jan 31, 2016
andrewsali pushed a commit to andrewsali/libcmaes that referenced this issue Jan 31, 2016
andrewsali pushed a commit to andrewsali/libcmaes that referenced this issue Jan 31, 2016
andrewsali pushed a commit to andrewsali/libcmaes that referenced this issue Jan 31, 2016
andrewsali pushed a commit to andrewsali/libcmaes that referenced this issue Jan 31, 2016
andrewsali pushed a commit to andrewsali/libcmaes that referenced this issue Jan 31, 2016
andrewsali pushed a commit to andrewsali/libcmaes that referenced this issue Jan 31, 2016
andrewsali pushed a commit to andrewsali/libcmaes that referenced this issue Jan 31, 2016
andrewsali pushed a commit to andrewsali/libcmaes that referenced this issue Jan 31, 2016
andrewsali pushed a commit to andrewsali/libcmaes that referenced this issue Jan 31, 2016
andrewsali pushed a commit to andrewsali/libcmaes that referenced this issue Jan 31, 2016
andrewsali pushed a commit to andrewsali/libcmaes that referenced this issue Jan 31, 2016
andrewsali pushed a commit to andrewsali/libcmaes that referenced this issue Jan 31, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants