Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Adding support for adaptation of a dense mass matrix based on sample covariances #3596

Merged
merged 12 commits into from
Dec 4, 2019
Merged

Conversation

dfm
Copy link
Contributor

@dfm dfm commented Aug 19, 2019

This pull request suggests an interface for full mass matrix adaptation based on an online estimate of the sample covariance. This is similar to the existing QuadPotentialDiagAdapt implementation except it updates the full covariance matrix.

I have blogged an example where the performance of the default adaptation routine is much poorer than it should be. While this example might seem a bit contrived, I've found that that dense mass matrix adaptation is absolutely crucial for many of the applications that I've been considering in astronomy (for example: the exoplanet library).

So based on this experience, I took a quick stab at implementing an interface like this directly into pymc3 and I'd love to get your feedback!

I also posted a short demo based on the blog post implemented using this pull request: https://gist.github.com/dfm/da1d0470d6fb54c63e6a913c1ef67a9e

@junpenglao
Copy link
Member

junpenglao commented Aug 20, 2019

Thanks for the pull request! This is a great contribution!!!

Copy link
Member

@aseyboldt aseyboldt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks nice.
I'd love to have a tests that also samples using the new adaptation.
Eventually, we should expose this as adapt='jitter+adapt_dense' in pm.sample, but for now I'd rather hide it a bit, so that we can do some testing. I'd also add a warning that this is experimental for now.
There are a couple of improvements I could think of:

  • Is it worth it to update the covariance in every step during tuning? It will have to do a cholesky decomposition each time.
  • We never need the full matrix, but only matrix products and products with a decomposition. So we could just store the (k, n) matrix V of samples (dim n, num samples k) and then do a svd of that. We could compute the matrix product as (I + sigma VV.T), and use the svd to draw samples.
  • What are good window sizes? This needs testing.

But I wouldn't let either of those stop us from merging this.

pymc3/step_methods/hmc/quadpotential.py Outdated Show resolved Hide resolved
@aseyboldt
Copy link
Member

@dfm I wrote a very experimental version of mass matrix adaptation that uses low-rank approximations for the mass matrix, so it does not need to store a whole n by n matrix. I'd be interested to hear if this helps for your models.

https://github.com/aseyboldt/covadapt

@dfm
Copy link
Contributor Author

dfm commented Sep 2, 2019

@aseyboldt: Thanks for the feedback! Your covadapt project looks awesome and I'll definitely take it for a test drive.

@dfm
Copy link
Contributor Author

dfm commented Sep 2, 2019

Is it worth it to update the covariance in every step during tuning? It will have to do a cholesky decomposition each time.

In my experience, it tends to be fine to just update the matrix at the end of each adaptation window (that's what Stan does too) but I couldn't figure out how to incorporate that with the current tuning implementation in PyMC3 because the step doesn't seem to know anything about how long tuning is and when it is going to finish. This seemed like a problem because then the last adaptation window might finish after tuning and the covariance estimate might never be updated based on the longest run. Does that make sense and is there something I'm missing?

We never need the full matrix, but only matrix products and products with a decomposition. So we could just store the (k, n) matrix V of samples (dim n, num samples k) and then do a svd of that. We could compute the matrix product as (I + sigma VV.T), and use the svd to draw samples.

This is a good suggestion. Would you rather try to merge or point people to covadapt instead of something like this pull request?

What are good window sizes? This needs testing.

Agreed!

@codecov
Copy link

codecov bot commented Dec 3, 2019

Codecov Report

Merging #3596 into master will increase coverage by 0.04%.
The diff coverage is 94.82%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #3596      +/-   ##
==========================================
+ Coverage    89.9%   89.94%   +0.04%     
==========================================
  Files         134      134              
  Lines       20269    20430     +161     
==========================================
+ Hits        18222    18376     +154     
- Misses       2047     2054       +7
Impacted Files Coverage Δ
pymc3/step_methods/hmc/quadpotential.py 73.75% <92.39%> (+5.41%) ⬆️
pymc3/tests/test_quadpotential.py 95.45% <97.56%> (+3.53%) ⬆️

@dfm
Copy link
Contributor Author

dfm commented Dec 3, 2019

@aseyboldt:

  • I added an option that allows the factorization to be recomputed on a different interval (e.g. only every 10 tuning steps) which isn't perfect, but I don't see an obvious better method in the current sampling implementation.
  • I added a few more tests to make sure that things work as expected and that pm.sample executes.
  • There is now a UserWarning with an "experimental" warning when a QuadPotentialFullAdapt is instantiated.
  • I've found that the default window sizes work well for all the problems I've worked on (we've been using it in exoplanet for a few versions now) but I don't have any sense of what kinds of tests you'd want in general.

Let me know if there's something you'd like to see added/removed or if you'd rather skip this and encourage people to use covadapt instead. Thanks!

Copy link
Member

@eigenfoo eigenfoo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A minor nitpick and a comment - thanks for the PR @dfm!

pymc3/step_methods/hmc/quadpotential.py Outdated Show resolved Hide resolved
pymc3/step_methods/hmc/quadpotential.py Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants