Add sample_generative function#2983
Conversation
springcoil
left a comment
There was a problem hiding this comment.
LGTM - needs an addition to the RELEASE NOTES though.
|
Not for this PR - but a future PR would include some benchmarks I think. Other than that awesome work @ColCarroll |
pymc3/sampling.py
Outdated
There was a problem hiding this comment.
forward_val is the numpy version of the forward function and should be faster.
There was a problem hiding this comment.
Should use .transformation.forward here.
There was a problem hiding this comment.
Oops i meant .transformation.forward_val should be used here... Should be slightly faster:
model[untransformed].transformation.forward(data[untransformed]).eval() --> model[untransformed].transformation.forward_val(data[untransformed])
pymc3/distributions/distribution.py
Outdated
There was a problem hiding this comment.
the size args is not working yet:
from pymc3.distributions.distribution import draw_values
with pm.Model() as m:
p = pm.Beta('p', 1., 1.)
draw_values([m['p']], size=10)
[array(0.72159403)]There was a problem hiding this comment.
your previous example is still a little funny too:
X = theano.shared(np.arange(3))
with pm.Model() as m:
ind = pm.Categorical('i', np.ones(3)/3)
x = pm.Deterministic('X', X[ind])
prior=pm.sample_generative()
There was a problem hiding this comment.
I am really good at discovering edge cases.
pymc3/sampling.py
Outdated
There was a problem hiding this comment.
should we rename this sample_prior_predictive?
There was a problem hiding this comment.
and sample_ppc should then probably be sample_posterior_predictive (it's not really doing a check).
There was a problem hiding this comment.
I think this is now actually the generative function (see the test case). The prior predictive would be nice to also implement.
There was a problem hiding this comment.
I looked at the test but I don't understand the difference between generative and prior predictive here.
There was a problem hiding this comment.
Ugh... I think I still don't. Will work out a simple model by hand later to try to help my intuition.
|
I agree with the renaming
…On Sat, 19 May 2018, 2:43 pm Thomas Wiecki, ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In pymc3/sampling.py
<#2983 (comment)>:
> @@ -1208,6 +1209,49 @@ def sample_ppc_w(traces, samples=None, models=None, weights=None,
return {k: np.asarray(v) for k, v in ppc.items()}
+def sample_generative(samples=500, model=None, vars=None, random_seed=None):
and sample_ppc should then probably be sample_posterior_predictive (it's
not really doing a check).
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#2983 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AA8DiLV0bYDwo4DMtt3WzNFjxvzTuEHOks5t0BNygaJpZM4UFdN9>
.
|
pymc3/distributions/distribution.py
Outdated
There was a problem hiding this comment.
I think that isinstance check should me replaced with checker for name
pymc3/tests/test_sampling.py
Outdated
There was a problem hiding this comment.
That seems a bit crude, couldn't we just test for the mean and sd?
|
I am coming to this discussion late, but I was talking with Chris Fonnesbeck about this issue and he pointed me to this PR. This functionality would be very helpful to me. I am working on a new volume of Think Bayes that will use PyMC, and I plan to demonstrate a development process that uses a Especially for verification, it would be ideal if generation and inference are as identical as possible. For example, I would love to be able to run To generate the prior predictives, and then If the first one validates, it would be really hard to mess up the second one. This API would require This notebook contains more examples of what I have in mind. https://github.com/AllenDowney/BayesMadeSimple/blob/master/zigzag.ipynb In those examples I use Thank you all! |
|
Fascinating idea!!!
…On Fri, 1 Jun 2018, 6:40 pm Allen Downey, ***@***.***> wrote:
I am coming to this discussion late, but I was talking with Chris
Fonnesbeck about this issue and he pointed me to this PR.
This functionality would be very helpful to me. I am working on a new
volume of Think Bayes that will use PyMC, and I plan to demonstrate a
development process that uses a pm.Model to generate a prior predictive
distribution before running an inference. I think this is useful for both
validation (model makes sense) and verification (model specification is
correct).
Especially for verification, it would be ideal if generation and inference
are as identical as possible. For example, I would love to be able to run
with pm.Model() as model:
mu = pm.Gamma('mu', alpha, beta)
goals = pm.Poisson('goals', mu)
trace = pm.sample(1000)
To generate the prior predictives, and then
with pm.Model() as model:
mu = pm.Gamma('mu', alpha, beta)
goals = pm.Poisson('goals', mu, observed=6)
trace = pm.sample(1000)
If the first one validates, it would be really hard to mess up the second
one.
This API would require pm.sample to analyze the model, see that there is
no data, and run sample_prior_predictive rather than any of the MCMC
samplers. I assume that would not be hard, but I don't really know.
This notebook contains more examples of what I have in mind.
https://github.com/AllenDowney/BayesMadeSimple/blob/master/zigzag.ipynb
In those examples I use pm.sample to generate, but (you will not be
surprised to hear) it doesn't work very well. @ColCarroll
<https://github.com/ColCarroll> , since you saw the talk where I
presented this notebook, you might have thoughts to add.
Thank you all!
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#2983 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AA8DiOoNyaRuR_1nXGUanaVFfORluf86ks5t4W52gaJpZM4UFdN9>
.
|
|
So I wonder if |
|
I think there are lots of situations where we want to sample from a joint (unobserved) distribution that we describe using the machinery of pymc---pymc allows one to describe models in terms of random variables and functions of them, which is the right level of abstraction for that kind of work. One example: Monte Carlo based probabilistic sensitivity analysis of an economic model. |
|
Another example: for performance reasons, one may sometimes apply Bayes' rule analytically with a known conjugate prior. E.g., perhaps it's computationally faster to model the posterior RVs directly, rather than include the prior and observations. In such cases, the model won't have observed / likelihood nodes. |
|
Right, but those situations don’t require MCMC, do they? Just forward simulation using independent sampling. |
|
Agreed, but my point is that such folks ought to use pymc rather than write their own model description / independent sampling code. |
|
Yeah I think we are talking about the same thing. Following on Allen’s point, we could engineer |
|
@AllenDowney I actually started using your notebook as a test case on the train home last night! I think the issues brought up here with My goal for the api is more like which allows for model reuse. There's no reason your approach could not also be supported, though it makes me a little nervous to overload a function like that: in particular, silently ignoring arguments like |
|
That makes sense: the parameters of To be completely systematic, you might want four functions:
But maybe (1) is not necessary; if you're going to run the model forward, you might as well generate both the priors and the prior predictives. If you like shorter names, maybe |
|
I like this naming scheme. For completeness we should indeed add |
6c9d38f to
626f074
Compare
|
I'm at a talk by Mike Betancourt tonight. Is this implemented yet. |
|
Have been making surprisingly steady progress. I'll push a more complete description of the changes when it is ready (tomorrow?), but will push something partial tonight. Has turned into a partial rewrite of all shape handling, which runs through I do have it so that @AllenDowney's notebook runs cleanly without "abusing" PyMC3 😄 ! |
|
That's awesome. Thank you, Colin!
…On Thu, Jun 14, 2018 at 3:26 PM, Colin ***@***.***> wrote:
Have been making surprisingly steady progress. I'll push a more complete
description of the changes when it is ready (tomorrow?), but will push
something partial tonight.
Has turned into a partial rewrite of all shape handling, which runs
through generate_samples. I think what I have is much cleaner now, and
ironically most of what is left is cleaning up all the code that deals with
bad behavior in the old/current implementation. See, for example,
Multinomial._random.
I do have it so that @AllenDowney <https://github.com/AllenDowney>'s
notebook runs cleanly without "abusing" PyMC3 😄 !
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#2983 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ABy37cTXRb_cYoyamShDFpCK5_hzX02rks5t8rjfgaJpZM4UFdN9>
.
|
|
Wicked!!!
…On Thu, 14 Jun 2018, 8:26 pm Colin, ***@***.***> wrote:
Have been making surprisingly steady progress. I'll push a more complete
description of the changes when it is ready (tomorrow?), but will push
something partial tonight.
Has turned into a partial rewrite of all shape handling, which runs
through generate_samples. I think what I have is much cleaner now, and
ironically most of what is left is cleaning up all the code that deals with
bad behavior in the old/current implementation. See, for example,
Multinomial._random.
I do have it so that @AllenDowney <https://github.com/AllenDowney>'s
notebook runs cleanly without "abusing" PyMC3 😄 !
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#2983 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AA8DiMQsRwcyib1Fi62pUCQK8ykK89Gdks5t8rjagaJpZM4UFdN9>
.
|
22147c4 to
d03ab9e
Compare
|
Oddly, I think this is nearly ready to merge. There were a few unexpected tests failing, and I think a few distributions (notably, Everything is quite fast for I do have a modestly embarrassing question about nomenclature, and what the prior vs prior predictive is. In the example below, would I be able to define |
|
My understanding is that prior predictive will return: The current output from the function is what I would expect. |
|
Found some small things. I'm very excited about having this functionality. It definitely needs a release note as well as an example, though. |
|
Thanks for those! I'd prefer to add an example in a separate PR, if that is ok. This touches so much code in subtle ways... |
542e73a to
c12af1a
Compare
|
Thank you all for your work on this, especially @ColCarroll!
Do you know when this will make it into a release (or has it already)?
…On Mon, Jun 18, 2018 at 1:01 AM, Junpeng Lao ***@***.***> wrote:
Merged #2983 <#2983>.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#2983 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ABy37S5y87VqK85Pt936ijxQwJPs98KJks5t9zQkgaJpZM4UFdN9>
.
|
|
It will be in the next release. @fonnesbeck @twiecki maybe we should prepare a release actually, we got quite some new features. |
|
I've been using this function and the graph function daily-ish and ironing out the bugs. Which are shape related, obviously. |
|
Are there any non-shape bugs? The answer to should we do a new release is always yes. |
|
OK, I will get a release ready over the weekend. |

This replaces #2876, since @springcoil's recent improvements to
draw_values(#2979) makes this somewhat easier. Includes a test, plus two more I had written fordraw_values. Both of them actually failed without changes here.This method is quite fast now.