Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add MLDA stepper #3926

Merged
merged 89 commits into from
Sep 15, 2020
Merged

Add MLDA stepper #3926

merged 89 commits into from
Sep 15, 2020

Conversation

gmingas
Copy link
Contributor

@gmingas gmingas commented May 18, 2020

The MLDA stepper

The MLDA (Multi-Level Delayed Acceptance) [1] method employs a hierarchy of chains to sample from a posterior. The top-level chain samples from the fine model posterior (i.e. the most accurate one / the one we are interested in) and the lower level chains sample from coarse model posteriors (i.e. approximations of declining accuracy).

MLDA uses samples generated in one level as proposals for the level above. A chain runs for a fixed number of iterations and then the last sample is used as the proposal for the higher-level chain. The bottom level is a Metropolis sampler, although an algorithm like pCN or adaptive MH is expected to be used in the future.

The stepper is suitable for situations where a model is expensive to evaluate in high spatial or temporal resolution (e.g. realistic subsurface flow models where a PDE needs to be solved in each iteration). In those cases we can use cheaper, lower resolutions as coarse models. If the approximations are sufficiently good, this leads to good quality proposals, high acceptance rates and high ES/sec compared to other methods because only a few expensive fine model solves are needed to achieve the same ESS.

Changes

  • MLDA is a new ArrayStepShared step method in metropolis.py. Its constructor instantiates step methods for all the levels in the hierarchy.
  • RecursiveDA is a new Proposal proposal method in metropolis.py. It recursively calls sample to run the hierarchy of chains in multiple levels, using the step methods instantiated by MLDA.init. Note that logging is switched off here to avoid unwanted console output. sample() each time generates a fixed number of samples and then we only keep the last and return control to the level above. We rerun sample() from that point each time control goes back to that level.
  • A new internal variable called is_mlda_base is added to Metropolis. This is detected within sample() to avoid running reset_tuning(). This is done because we want the Metropolis step in the bottom level to maintain the tuning information across subsequent sample() calls. Note that resetting was introduced by Metropolis chain tuning is differs between single- and multiprocessing #3733. MLDA is not affected by the issue there, as it always uses cores=1 and chains=1 in each level of the hierarchy. A more elegant solution might be the detection of cores and chains and passing the info to _iter_sample to decide if reset_tuning runs.
  • Tests for MLDA and RecursiveDA and two new models for testing in tests/models.py
  • A notebook comparing the performance of MLDA with Metropolis for a groundwater flow model (a Bayesian inverse problem setting). The forward model requires a PDE solve which is done using the C-based FEniCS library (see notebook for details). Thus it is also a demonstration of using black box external code in the likelihood. The PR adds all the necessary code to solve the model. The example is also provided in the form of a python script.

Performance

Running the notebook on a MacBook Pro (see specs within), with 3 unknown parameters and model resolution (30, 30) and (120, 120), the resulting performance comparison is the following:

Untitled

Work in progress

  • I am trying to find ways to accelerate the method. It seems that there is a lot of overhead from creating/destroying traces and Theano parsing the likelihood when calling sample() multiple times. Stopping and restarting is necessary as I need to switch between chains and continue a chain from where it last left off. Are there faster ways to pause and resume a chain than saving the trace/tuning and calling a new sample() afterwards? Would iter_sample be more efficient in this case?
  • I am working on adding an adaptive bottom level sampler and also on applying an adaptive likelihood correction technique found in [2]. Is there any work going on adaptive samplers like adaptive MH or pCN within the pymc3 community?
  • I plan to add one more feature contained in [1] (the variance reduction technique to reduce variance of integral estimates).

Usage

Can be used as any other method but has one positional argument that needs to be provided (coarse_models). See notebook and tests for examples.

References

[1] Dodwell, Tim & Ketelsen, Chris & Scheichl, Robert & Teckentrup, Aretha. (2019). Multilevel Markov Chain Monte Carlo. SIAM Review. 61. 509-545. https://doi.org/10.1137/19M126966X
[2] Cui, Tiangang & Fox, Colin & O'Sullivan, Michael. (2012). Adaptive Error Modelling in MCMC Sampling for Large Scale Inverse Problems.

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

Review Jupyter notebook visual diffs & provide feedback on notebooks.


Powered by ReviewNB

@twiecki
Copy link
Member

twiecki commented May 18, 2020

This looks really cool and massive @gmingas. Thanks for the contribution. I'll wait if someone jumps on checking this out in more detail. Some questions:

  1. What type of feedback are you looking for at this stage? If there is serious work that you still plan to do it makes more sense to wait with a review until you consider it final, as reviewing a PR of this size is quite a time investment.
  2. Could this sampler live outside of PyMC3, as an extension module? A related question is how widely applicable this sampler is, especially compared to NUTS. I suppose this doesn't require gradients which would require a comparison of MH (which you did somewhat) as well as to DE-Metropolis.

pymc3/sampling.py Outdated Show resolved Hide resolved
pymc3/sampling.py Outdated Show resolved Hide resolved
@junpenglao
Copy link
Member

Adding @aseyboldt as reviewer as he have been working with PDE.

@gmingas
Copy link
Contributor Author

gmingas commented May 19, 2020

  1. What type of feedback are you looking for at this stage? If there is serious work that you still plan to do it makes more sense to wait with a review until you consider it final, as reviewing a PR of this size is quite a time investment.

At its current form the stepper is useable and has benefits in the type of problems described above (also see below). There are a few extra features that I would like to add in the near future. These will be enhancements/add-ons rather than refactoring/rewriting so it might be worth reviewing it at its current form. I published this as WIP to see initial reactions and prioritise next steps. I can remove the WIP from the title if this makes sense. In terms of feedback I was thinking about a few things:

  • Comments/preferences on future development and features from any people familiar with multilevel models and MCMC methods or users of them.
  • Ways in which the recursive part of the code can be accelerated by avoiding repeated calls to sample() or making them more efficient.
  • Comments on code style, the example provided, extra tests that might be needed.
  • Discussion on adaptive enhancements to the algorithm, i.e. what adaptive algorithm to use in the bottom level (is there any work in progress on adaptive method in pymc3?) and how to correct for errors between approximations, e.g. at the moment I am considering the method described in Cui et al. 2012 (link in initial PR post).
  1. Could this sampler live outside of PyMC3, as an extension module? A related question is how widely applicable this sampler is, especially compared to NUTS. I suppose this doesn't require gradients which would require a comparison of MH (which you did somewhat) as well as to DE-Metropolis.

In terms of applicability, it does not require gradients. It has the requirement that you need to provide at least one coarse (approximate) model apart from the main model. Despite that, I think the sampler is general enough to live inside PyMC3, as there are many people working with likelihoods where an approximate/coarse version can be defined: Inverse problems for various physical systems where spatial resolution can vary and PDEs are usually necessary (e.g. subsurface fluid transportation, resistivity/conductivity field of impedance imaging, ultrasound imaging, emission tomography, flow field of ocean circulation). MLDA can reduce the computational cost in those cases. There are also other problems where an approximate model is possible to define (it would be good to find out about possible applications from people). Note also that in many of these problem it might not be easy to get gradients for use with NUTS.

A comparison with DE-Metropolis is needed indeed and I will work on this. Apart from the ES/sec metric, what other metrics would you like to have for benchmarking and scaling exploration?

@volpatto
Copy link

volpatto commented May 19, 2020

Sorry for including myself in this discussion (I'm not a PyMC3-dev, but a user), I think this feature is absolutely amazing! This is related to MLMC that Mike Giles has been working on, right? That's fantastic, it's a method that the only lib I know that provides it in a generic way is QUESO. Just for reference, I did a post in PyMC3 discourse about it.

If you are looking for some user feedback, I can try it out in a recent work that I have been done in a COVID-19 research group, with the potential to change our current method (SMC from PyMC3) to your new method. The group has previous experience in using MLMC from QUESO.

And it's nice that you tested it with FEniCS. I use Firedrake in my PhD, which also uses UFL and has almost the same interface. This PR helps me a lot in several ways, thank you for your effort.

@twiecki
Copy link
Member

twiecki commented May 19, 2020

@volpatto That's valuable feedback. Certainly if you'd be willing to help test and review it would be very helpful and much appreciated.

@volpatto
Copy link

volpatto commented May 19, 2020

@volpatto That's valuable feedback. Certainly if you'd be willing to help test and review it would be very helpful and much appreciated.

Thanks, @twiecki! I don't know well PyMC3 internals, but I think that I can give some help (with my experience as a Python developer).

Actually this is interesting because I have been willing to try to implement MLMC (a basic version, the version in this PR is improved, I need to study it) in PyMC3, and I think that it can be really useful.

Comment on lines 1106 to 1119
"""One MLDA step, given current sample q0"""
# Check if the tuning flag has been changed and if yes,
# change the proposal's tuning flag and reset self.accepted
# This is triggered by iter_sample while the highest-level MLDA step
# method is running. It then propagates to all levels.
if self.proposal_dist.tune != self.tune:
self.proposal_dist.tune = self.tune
# set tune in sub-methods of compound stepper explicitly because
# it is not set within sample() (only the CompoundStep's tune flag is)
if isinstance(self.next_step_method, CompoundStep):
for method in self.next_step_method.methods:
method.tune = self.tune
self.accepted = 0
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, if the bottom-level Metropolis methods are within a Compound step, I manually set their tune flags

CompoundStep has a stop_tuning() function that is called by _iter_sample to set tune=False for all the methods contained in it. But the Metropolis class does not have a stop_tuning() function thus the metropolis flags are not set.

If there is no reason not to, should I add a stop_tuning() to Metropolis?

@tjdodwell
Copy link

@volpatto thanks for your interest in our contribution. Yes it builds on Mike Giles work directly, in his original work on Multilevel Monte Carlo (MLMC), which was just the forward problem. This was then extended to the Bayesian setting, aka. Multilevel Markov Chain Monte Carlo - which can be found here . The multilevel delayed acceptance is very close to this; and the connection is discussed in the MLMCMC paper. Paper to follow soon, but ask if you have any questions.

@gmingas
Copy link
Contributor Author

gmingas commented May 20, 2020

@volpatto thanks for the comments. It would be great if you could try it out on the COVID work you mentioned. I would be interested in seeing if you notice any unexpected behaviour or bugs and from what extra features and improvements your application would benefit. Also, how performance compares to SMC and QUESO. We could potentially add new examples and tests based on your application.

@volpatto
Copy link

volpatto commented May 21, 2020

@volpatto thanks for your interest in our contribution. Yes it builds on Mike Giles work directly, in his original work on Multilevel Monte Carlo (MLMC), which was just the forward problem. This was then extended to the Bayesian setting, aka. Multilevel Markov Chain Monte Carlo - which can be found here . The multilevel delayed acceptance is very close to this; and the connection is discussed in the MLMCMC paper. Paper to follow soon, but ask if you have any questions.

I see! I have been reading this paper (btw, nicely done) in order to have a proper understanding of your method. But as soon as I can I'll come back here to help, hopefully in the current week.

@volpatto thanks for the comments. It would be great if you could try it out on the COVID work you mentioned. I would be interested in seeing if you notice any unexpected behaviour or bugs and from what extra features and improvements your application would benefit. Also, how performance compares to SMC and QUESO. We could potentially add new examples and tests based on your application.

Amazing! Looking forward to do some tests with MLDA. I'll prepare a simplified SIRD/SEIRD model calibration and forward UQ with SMC for us, so we can compare it with MLDA. Would it be helpful if I fork your PyMC3 fork and submit a PR (with the ipynb) pointing to this branch here, @gmingas?

Cheers!

@gmingas
Copy link
Contributor Author

gmingas commented May 21, 2020

Amazing! Looking forward to do some tests with MLDA. I'll prepare a simplified SIRD/SEIRD model calibration and forward UQ with SMC for us, so we can compare it with MLDA. Would it be helpful if I fork your PyMC3 fork and submit a PR (with the ipynb) pointing to this branch here, @gmingas?

@volpatto Sure, this sounds great if you are willing to do it! Let me know if you don't understand something or if I can help in any other way.

@gmingas
Copy link
Contributor Author

gmingas commented May 21, 2020

  1. Could this sampler live outside of PyMC3, as an extension module? A related question is how widely applicable this sampler is, especially compared to NUTS. I suppose this doesn't require gradients which would require a comparison of MH (which you did somewhat) as well as to DE-Metropolis.

I have added a comparison with DEMetropolis in the existing notebook here, summary shown below:
Screenshot 2020-05-21 at 21 08 51

I also included a second notebook here with the same comparison done when using blocked sampling for Metropolis and MLDA:
Screenshot 2020-05-21 at 21 11 18.

@michaelosthege
Copy link
Member

Hi, I'm currently just on the phone, but skimming through the notebook..
Please note that the effective sample size is not reliable if the chains are not converged.

@gmingas
Copy link
Contributor Author

gmingas commented May 26, 2020

Hi, I'm currently just on the phone, but skimming through the notebook..
Please note that the effective sample size is not reliable if the chains are not converged.

@michaelosthege it looks like MLDA and DEMetropolis are have r_hat 1.00-1.02 and Metropolis up to 1.06. What is a good threshold to assess convergence?

@michaelosthege
Copy link
Member

Hi, I'm currently just on the phone, but skimming through the notebook..
Please note that the effective sample size is not reliable if the chains are not converged.

@michaelosthege it looks like MLDA and DEMetropolis are have r_hat 1.00-1.02 and Metropolis up to 1.06. What is a good threshold to assess convergence?

depends.. For the benchmarks, I'd say something like 1.03. The Metropolis trace didn't look that great..

@michaelosthege
Copy link
Member

michaelosthege commented May 26, 2020

The DEMetropolis doesn't perform well with few chains. Try DEMetropolisZ instead - it can really outperform DEMetropolis and parallelizes much better.

It does need enough tuning though. Initializing chains in the MAP often helps. See the example

@gmingas
Copy link
Contributor Author

gmingas commented May 27, 2020

Also, check this topic in PyMC3 discourse, it is related to the additional feature I am looking to add to perform adaptive error correction in the coarse likelihoods.

@gmingas
Copy link
Contributor Author

gmingas commented May 27, 2020

The DEMetropolis doesn't perform well with few chains. Try DEMetropolisZ instead - it can really outperform DEMetropolis and parallelizes much better.

depends.. For the benchmarks, I'd say something like 1.03. The Metropolis trace didn't look that great..

Cool, I will do longer runs and compare with DEMetropolisZ too.

@gmingas
Copy link
Contributor Author

gmingas commented Jun 19, 2020

Sorry for the big delay, I have now updated this notebook to contain a comparison with DEMCMC-Z and also longer runs. All the r-hats are now very close to 1.0 and the traces look decent. Also, I am initialising all samplers on the true values of the parameters (which should be quite close to the MLE), although I have not seen any big change by doing that.

I am also preparing a second notebook similar to the example that @michaelosthege linked to above to benchmark more model configurations and also demonstrate MLDA's tuning process.

@gmingas

This comment has been minimized.

@gmingas
Copy link
Contributor Author

gmingas commented Jun 23, 2020

I have also added this notebook which contains some extra benchmarks and a demonstration of how MLDA performance changes with one of the tuning parameters (subsampling_rate).

You are also now allowed to pick a different subsampling rate for each level.

@gmingas gmingas changed the title [WIP] Add MLDA stepper Add MLDA stepper Jul 9, 2020
@gmingas
Copy link
Contributor Author

gmingas commented Jul 9, 2020

@michaelosthege @aseyboldt I have now removed the WIP from the title as I think this is now ready to be reviewed at its current form.

@junpenglao
Copy link
Member

@volpatto did you have a chance to test it out?

…or stats and also test for declining acceptance rate
This is done because of the bug described in arviz-devs/arviz#1279. The commit also changes a few parameters in the MLDA .py example to match the ones in the equivalent notebook.
…new MetropolisMLDA class which inherits Metropolis and just alters the reset_tuning function, modify test_step and __init__ files accordingly
…enics and delete reference to example from MLDA class docstring
… data, more comments, performance comparison with Metropolis
@gmingas
Copy link
Contributor Author

gmingas commented Sep 14, 2020

This is odd. Tests are now passing here but still failing in my fork (with the message I posted earlier).

Also, now codecov fails but the results/flagged issues do not make sense (e.g. it says that metropolis.py coverage dropped ~44% but the PR modifies only a few spaces in Metropolis, also some of the lines the tools thinks are not tested are actually tested and there was never a warning before).

@twiecki
Copy link
Member

twiecki commented Sep 15, 2020

Yeah, codecov isn't always that reliable. We'll merge this as I believe those issues you're seeing are unrelated to anything in this PR.

Congrats -- this is a massive contribution and a very high quality PR, thank you guys so much for your contribution!

@twiecki twiecki merged commit 07b584a into pymc-devs:master Sep 15, 2020
@gmingas
Copy link
Contributor Author

gmingas commented Sep 15, 2020

That's great! Thank you @twiecki and everyone else for the feedback and support, this was a very positive experience for all of us and helped us understand and improve our method and code. We will keep working on improving/enhancing MLDA, you will hear from us again very soon. Cheers!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants