-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add MLDA stepper #3926
Add MLDA stepper #3926
Conversation
Check out this pull request on Review Jupyter notebook visual diffs & provide feedback on notebooks. Powered by ReviewNB |
This looks really cool and massive @gmingas. Thanks for the contribution. I'll wait if someone jumps on checking this out in more detail. Some questions:
|
Adding @aseyboldt as reviewer as he have been working with PDE. |
At its current form the stepper is useable and has benefits in the type of problems described above (also see below). There are a few extra features that I would like to add in the near future. These will be enhancements/add-ons rather than refactoring/rewriting so it might be worth reviewing it at its current form. I published this as WIP to see initial reactions and prioritise next steps. I can remove the WIP from the title if this makes sense. In terms of feedback I was thinking about a few things:
In terms of applicability, it does not require gradients. It has the requirement that you need to provide at least one coarse (approximate) model apart from the main model. Despite that, I think the sampler is general enough to live inside PyMC3, as there are many people working with likelihoods where an approximate/coarse version can be defined: Inverse problems for various physical systems where spatial resolution can vary and PDEs are usually necessary (e.g. subsurface fluid transportation, resistivity/conductivity field of impedance imaging, ultrasound imaging, emission tomography, flow field of ocean circulation). MLDA can reduce the computational cost in those cases. There are also other problems where an approximate model is possible to define (it would be good to find out about possible applications from people). Note also that in many of these problem it might not be easy to get gradients for use with NUTS. A comparison with DE-Metropolis is needed indeed and I will work on this. Apart from the ES/sec metric, what other metrics would you like to have for benchmarking and scaling exploration? |
Sorry for including myself in this discussion (I'm not a PyMC3-dev, but a user), I think this feature is absolutely amazing! This is related to MLMC that Mike Giles has been working on, right? That's fantastic, it's a method that the only lib I know that provides it in a generic way is QUESO. Just for reference, I did a post in PyMC3 discourse about it. If you are looking for some user feedback, I can try it out in a recent work that I have been done in a COVID-19 research group, with the potential to change our current method (SMC from PyMC3) to your new method. The group has previous experience in using MLMC from QUESO. And it's nice that you tested it with FEniCS. I use Firedrake in my PhD, which also uses UFL and has almost the same interface. This PR helps me a lot in several ways, thank you for your effort. |
@volpatto That's valuable feedback. Certainly if you'd be willing to help test and review it would be very helpful and much appreciated. |
Thanks, @twiecki! I don't know well PyMC3 internals, but I think that I can give some help (with my experience as a Python developer). Actually this is interesting because I have been willing to try to implement MLMC (a basic version, the version in this PR is improved, I need to study it) in PyMC3, and I think that it can be really useful. |
pymc3/step_methods/metropolis.py
Outdated
"""One MLDA step, given current sample q0""" | ||
# Check if the tuning flag has been changed and if yes, | ||
# change the proposal's tuning flag and reset self.accepted | ||
# This is triggered by iter_sample while the highest-level MLDA step | ||
# method is running. It then propagates to all levels. | ||
if self.proposal_dist.tune != self.tune: | ||
self.proposal_dist.tune = self.tune | ||
# set tune in sub-methods of compound stepper explicitly because | ||
# it is not set within sample() (only the CompoundStep's tune flag is) | ||
if isinstance(self.next_step_method, CompoundStep): | ||
for method in self.next_step_method.methods: | ||
method.tune = self.tune | ||
self.accepted = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, if the bottom-level Metropolis methods are within a Compound step, I manually set their tune flags
CompoundStep has a stop_tuning() function that is called by _iter_sample to set tune=False for all the methods contained in it. But the Metropolis class does not have a stop_tuning() function thus the metropolis flags are not set.
If there is no reason not to, should I add a stop_tuning() to Metropolis?
@volpatto thanks for your interest in our contribution. Yes it builds on Mike Giles work directly, in his original work on Multilevel Monte Carlo (MLMC), which was just the forward problem. This was then extended to the Bayesian setting, aka. Multilevel Markov Chain Monte Carlo - which can be found here . The multilevel delayed acceptance is very close to this; and the connection is discussed in the MLMCMC paper. Paper to follow soon, but ask if you have any questions. |
@volpatto thanks for the comments. It would be great if you could try it out on the COVID work you mentioned. I would be interested in seeing if you notice any unexpected behaviour or bugs and from what extra features and improvements your application would benefit. Also, how performance compares to SMC and QUESO. We could potentially add new examples and tests based on your application. |
I see! I have been reading this paper (btw, nicely done) in order to have a proper understanding of your method. But as soon as I can I'll come back here to help, hopefully in the current week.
Amazing! Looking forward to do some tests with MLDA. I'll prepare a simplified SIRD/SEIRD model calibration and forward UQ with SMC for us, so we can compare it with MLDA. Would it be helpful if I fork your PyMC3 fork and submit a PR (with the ipynb) pointing to this branch here, @gmingas? Cheers! |
@volpatto Sure, this sounds great if you are willing to do it! Let me know if you don't understand something or if I can help in any other way. |
I have added a comparison with DEMetropolis in the existing notebook here, summary shown below: I also included a second notebook here with the same comparison done when using blocked sampling for Metropolis and MLDA: |
Hi, I'm currently just on the phone, but skimming through the notebook.. |
@michaelosthege it looks like MLDA and DEMetropolis are have r_hat 1.00-1.02 and Metropolis up to 1.06. What is a good threshold to assess convergence? |
depends.. For the benchmarks, I'd say something like 1.03. The Metropolis trace didn't look that great.. |
The It does need enough tuning though. Initializing chains in the MAP often helps. See the example |
Also, check this topic in PyMC3 discourse, it is related to the additional feature I am looking to add to perform adaptive error correction in the coarse likelihoods. |
Cool, I will do longer runs and compare with DEMetropolisZ too. |
Sorry for the big delay, I have now updated this notebook to contain a comparison with DEMCMC-Z and also longer runs. All the r-hats are now very close to 1.0 and the traces look decent. Also, I am initialising all samplers on the true values of the parameters (which should be quite close to the MLE), although I have not seen any big change by doing that. I am also preparing a second notebook similar to the example that @michaelosthege linked to above to benchmark more model configurations and also demonstrate MLDA's tuning process. |
This comment has been minimized.
This comment has been minimized.
I have also added this notebook which contains some extra benchmarks and a demonstration of how MLDA performance changes with one of the tuning parameters (subsampling_rate). You are also now allowed to pick a different subsampling rate for each level. |
@michaelosthege @aseyboldt I have now removed the WIP from the title as I think this is now ready to be reviewed at its current form. |
@volpatto did you have a chance to test it out? |
…benchmarking one)
…or stats and also test for declining acceptance rate
This is done because of the bug described in arviz-devs/arviz#1279. The commit also changes a few parameters in the MLDA .py example to match the ones in the equivalent notebook.
…new MetropolisMLDA class which inherits Metropolis and just alters the reset_tuning function, modify test_step and __init__ files accordingly
…enics and delete reference to example from MLDA class docstring
…ods, fix some stylistic issues
…g point for new users
… data, more comments, performance comparison with Metropolis
…ark, arviz style, imports, etc)
This is odd. Tests are now passing here but still failing in my fork (with the message I posted earlier). Also, now codecov fails but the results/flagged issues do not make sense (e.g. it says that metropolis.py coverage dropped ~44% but the PR modifies only a few spaces in Metropolis, also some of the lines the tools thinks are not tested are actually tested and there was never a warning before). |
Yeah, codecov isn't always that reliable. We'll merge this as I believe those issues you're seeing are unrelated to anything in this PR. Congrats -- this is a massive contribution and a very high quality PR, thank you guys so much for your contribution! |
That's great! Thank you @twiecki and everyone else for the feedback and support, this was a very positive experience for all of us and helped us understand and improve our method and code. We will keep working on improving/enhancing MLDA, you will hear from us again very soon. Cheers! |
The MLDA stepper
The MLDA (Multi-Level Delayed Acceptance) [1] method employs a hierarchy of chains to sample from a posterior. The top-level chain samples from the fine model posterior (i.e. the most accurate one / the one we are interested in) and the lower level chains sample from coarse model posteriors (i.e. approximations of declining accuracy).
MLDA uses samples generated in one level as proposals for the level above. A chain runs for a fixed number of iterations and then the last sample is used as the proposal for the higher-level chain. The bottom level is a Metropolis sampler, although an algorithm like pCN or adaptive MH is expected to be used in the future.
The stepper is suitable for situations where a model is expensive to evaluate in high spatial or temporal resolution (e.g. realistic subsurface flow models where a PDE needs to be solved in each iteration). In those cases we can use cheaper, lower resolutions as coarse models. If the approximations are sufficiently good, this leads to good quality proposals, high acceptance rates and high ES/sec compared to other methods because only a few expensive fine model solves are needed to achieve the same ESS.
Changes
MLDA
is a newArrayStepShared
step method inmetropolis.py
. Its constructor instantiates step methods for all the levels in the hierarchy.RecursiveDA
is a newProposal
proposal method inmetropolis.py
. It recursively callssample
to run the hierarchy of chains in multiple levels, using the step methods instantiated by MLDA.init. Note that logging is switched off here to avoid unwanted console output.sample()
each time generates a fixed number of samples and then we only keep the last and return control to the level above. We rerunsample()
from that point each time control goes back to that level.is_mlda_base
is added toMetropolis
. This is detected withinsample()
to avoid running reset_tuning(). This is done because we want theMetropolis
step in the bottom level to maintain the tuning information across subsequentsample()
calls. Note that resetting was introduced by Metropolis chain tuning is differs between single- and multiprocessing #3733. MLDA is not affected by the issue there, as it always usescores=1
andchains=1
in each level of the hierarchy. A more elegant solution might be the detection ofcores
andchains
and passing the info to_iter_sample
to decide ifreset_tuning
runs.MLDA
andRecursiveDA
and two new models for testing intests/models.py
Performance
Running the notebook on a MacBook Pro (see specs within), with 3 unknown parameters and model resolution (30, 30) and (120, 120), the resulting performance comparison is the following:
Work in progress
sample()
multiple times. Stopping and restarting is necessary as I need to switch between chains and continue a chain from where it last left off. Are there faster ways to pause and resume a chain than saving the trace/tuning and calling a newsample()
afterwards? Woulditer_sample
be more efficient in this case?Usage
Can be used as any other method but has one positional argument that needs to be provided (
coarse_models
). See notebook and tests for examples.References
[1] Dodwell, Tim & Ketelsen, Chris & Scheichl, Robert & Teckentrup, Aretha. (2019). Multilevel Markov Chain Monte Carlo. SIAM Review. 61. 509-545. https://doi.org/10.1137/19M126966X
[2] Cui, Tiangang & Fox, Colin & O'Sullivan, Michael. (2012). Adaptive Error Modelling in MCMC Sampling for Large Scale Inverse Problems.