Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Dist shape refactor #1125

Closed
wants to merge 17 commits into from

Conversation

twiecki
Copy link
Member

@twiecki twiecki commented May 23, 2016

No description provided.

@twiecki twiecki changed the title Dist shape refactor Proposal: Dist shape refactor May 23, 2016
@@ -194,7 +210,7 @@ def logp(self, value):
tau > 0, sd > 0)


class HalfNormal(PositiveContinuous):
class HalfNormal(PositiveUnivariateContinuous):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be a case where we want to start using multiple inheritance, e.g. HalfNormal(Positive, Univariate, Continuous)

Copy link
Contributor

@brandonwillard brandonwillard May 23, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I thought so, too, but the old Java side of me tries to avoid it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems pretty innocuous here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should go that route here unless it would cause other unanticipated problems.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Started splitting those up in the most recent commit; however, it became apparent that constructor call order, and other confusing things, could arise in this scenario.

@brandonwillard
Copy link
Contributor

These changes should help address #535 .

@twiecki
Copy link
Member Author

twiecki commented May 30, 2016

@brandonwillard Are you going to make more changes (like the multiple inheritance) or want another round of review?

@brandonwillard
Copy link
Contributor

I planned on spending some part of tomorrow getting the current tests to pass, so I might have a lot to push soon.

@twiecki
Copy link
Member Author

twiecki commented May 30, 2016

Great! You'll probably also want to rebase.

@brandonwillard
Copy link
Contributor

brandonwillard commented May 31, 2016

@twiecki: To really go forward with symbolic scalars in RV shapes (again, the number of scalars is not symbolic, since Theano doesn't handle that), it looks like we would have to put some/all of the MCMC sampling logic into a Theano gof.Op.
The block-step code seems to depend on explicit shape values (within ArrayOrdering), and a different, non-Op approach to those operations doesn't immediately come to mind. Otherwise, compiling Theano functions to return the exact shapes seems lame, but feasible.

We could always limit the symbolic shape capabilities by assuming that the shape scalars are always constant values, then we can keep the new Theano shape logic, add some steps to extract the constant values, and work toward more Theano integration as we go. In this case, the additions would simply serve to establish some organizational elements that are useful for multivariate work (i.e. the separate shape tuples for support, replications, etc.) and some Theano manipulations that (attempt) to perform automatic shape inference.

Any thoughts?

@twiecki
Copy link
Member Author

twiecki commented May 31, 2016

@brandonwillard

I think rewriting the samplers in theano would make things way more complicated. Samplers have loops and if/then clauses, both things that are quite tricky in theano. The benefit would be GPU sampling but that would be it's own project.

I think unified shape handling (even if constant) is a major improvement. Note that there are also consistencies in the univariate RVs (e.g.#1079) that hopefully can be fixed with this (is it already?).

And as you said, we can always move closer to symbolic shapes later on. Out of curiosity, would this allow changing the dimensionality of the model while sampling? Thinking of non-parametrics.

Should we take a closer look? Still needs to be rebased which I suspect will cause some hick-ups with the T -> tt change which we already have on master now.

The latter should at least make

Seems like that stopped mid-way through.

@twiecki
Copy link
Member Author

twiecki commented May 31, 2016

4ccac8a this commit doesn't seem to change anything in regards to random behavior?

@brandonwillard
Copy link
Contributor

brandonwillard commented May 31, 2016

Yeah, that change in the commit might just be my automatic formatting.

@brandonwillard
Copy link
Contributor

brandonwillard commented May 31, 2016

Regarding the samplers, we might not want/need to use the high-level Theano operations. I was actually thinking that we move to more low-level Theano, such as implementing a theano.Op. That way we might be able to put most of the current non-theano Python in a theano.Op.perform method. There would be a little work in matching up the inputs and outputs, but it could be worth it in the end.

@brandonwillard
Copy link
Contributor

brandonwillard commented May 31, 2016

Oh, and #1079 would be addressed by these changes (might need the associated changes to the Distribution.random framework, though), since it specifies an order of dimensions for all distributions (i.e. shape_reps + shape_ind + shape_supp).

For the exact problem in #1079, these changes intend to allow the following:

# artificial data
k = 2
ndata = 10
data = np.array([1, 2.2, 2.5, 2.8, 4, 6, 7.2, 7.5, 7.8, 9])

# model
alpha = 0.1 * np.ones((ndata, k))
with pm.Model() as model:
    # Dirichlet will know its support is a vector of length k, i.e. shape_supp = (k,),
    # and that there are ndata-many independent copies, i.e. shape_ind = (ndata,),
    # finally, there are no replications, so shape_reps = tuple()
    p = pm.Dirichlet('p', alpha)
    # Notice below that the "shape" keyword has been exchanged for "size", and 
    # both Normal and Uniform have scalar support, i.e. shape_supp = tuple(), so 
    # we've really just specified k-many replications, i.e. shape_reps = (k,).
    # Also, shape_ind = tuple(), but would generally be whatever shape the parameters
    # are.
    mu = pm.Normal('mu', mu=5, sd=3, size=k)
    sd = pm.Uniform('sd', lower=0.1, upper=0.5, size=k)
    # Now, the following should work, since shape_supp = p.shape_supp,
    # shape_ind = p.shape_ind, shape_reps = p.shape_reps.
    categ = pm.Categorical('categ', p=p)  

    ...

The shapes for the last statement involving the Categorical distribution could be obtained directly , and in a generic way, from its parameter p with shape_supp = p.shape[-1], shape_ind = p.shape[:-1], since we know--by definition--that the Categorical distribution has vector support.
If we tried this in the current codebase, we'd p.shape[-1] would return a Theano scalar object and break things down the line.


Parameters
----------
shape_supp
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be reformatted eventually to match the numpy-doc style.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem. Documenting is much easier with a standard!

@twiecki
Copy link
Member Author

twiecki commented Jun 1, 2016

OK, I really like where this is going. The example is super helpful. Would you mind also adding one with multiple multivariate distributions? #1153 seems to be another good example.

Also want to get @fonnesbeck to take a look at the API.

So if I passed pm.Dirichlet('p', alpha.T) shape_supp would be ndata and shape_ind would be k? What if I did pm.Dirichlet('p', 0.1 * np.ones(k), size=ndata) -- would that be equivalent?

I'm not sure I fully grog the difference between shape_ind and shape_rep. For the dirichlet, couldn't ndata be the shape_rep?

What's left to do here? Certainly changing the examples (I can help with that), adding some unittests, random support (do you think this will be tricky?).

@brandonwillard
Copy link
Contributor

brandonwillard commented Jun 1, 2016

Yeah, in many cases shape_ind and shape_reps will serve the same purpose, but the distinction should be made. For example, the issue I was running into with bounded/truncated sampling arose because there was no distinction. Again, you can efficiently perform rejection sampling on many independent, identically distributed variables, but as soon as some of those variables have non-identical distributions (e.g. different means) you must handle those separately.
When someone instantiates a univariate distribution, like Normal, with non-scalar parameters, I'm just automatically assuming that the parameter tensors could have different values, making them independent but not identical, and the size parameter explicitly serves to produce independent and identical copies of a distribution, since there's no way of specifying different distribution parameters with size alone.

To get this truly working, we need to address every current use of the shape information and either Theano-ize those parts--if possible--or restrict the symbolic scalar functionality by forcibly extracting the constant values from the shape scalars (and giving errors if they aren't effectively constant). Similarly, we could create Theano functions that evaluate the symbolic shapes in those non-Theano places they're used. Of course those Theano functions should not take arguments (e.g. other random variables, etc.), but perhaps they could depend on shared variables, which can be evaluated at any time I believe.
The random support is probably just a lot of typing and connecting the dots with Theano's random functions. However, there is the matter of symbolic random values (that's what Theano's random actually returns); we might want to integrate that idea/type of object into the API. Gotta think about that one, though.

@springcoil
Copy link
Contributor

I think this needs a rebase. What else needs to be done?

@fonnesbeck
Copy link
Member

I need to take a close look. Also, there are no tests written yet, which will be important.

@brandonwillard
Copy link
Contributor

brandonwillard commented Jun 5, 2016

There's a lot to be done; this branch was only intended to facilitate a discussion on symbolic shape handling. It does seem possible to make a real PR out of it, though.

In summary:

  • Update the current non-symbolic uses of shape information. This may involve further integration with Theano or on the spot evaluation of shapes. See my comments above.
    • In ArrayOrdering
    • In Distribution.random()
      • This also seems like a great time and place to start using Theano's random code and RandomFunction class.
    • ...
  • Implement the size parameter for univariate distributions, so that they work like the example described above.
  • Finish updating multivariate distributions so that they follow the shape convention described in the example above.
  • Update old tests and/or (re-)add a shape keyword for backward compatibility.

@twiecki
Copy link
Member Author

twiecki commented Jun 6, 2016

Waiting for @fonnesbeck to comment on the API.

@brandonwillard
Copy link
Contributor

brandonwillard commented Jun 8, 2016

On a very related note, why aren't the random variable classes (i.e. FreeRV, ObservedRV, TransformedRV) Theano SharedVariables?
PyMC3's use of test/default values sort of implies that SharedVariables might be reasonable. Also, the random variables in PyMC3 would operate more like they do in PyMC2; that is, they would have a value member, as well.

FYI: Theano's random framework appears to use a gof.Op (RandomFunction, specifically) for the type of object PyMC3 refers to as a random variable. That Op then works like a factory to produce a TensorType, and update the rng state.

@fonnesbeck
Copy link
Member

@brandonwillard you'd have to ask @jsalvatier to comment on that, as he designed most of PyMC3. PyMC3 is more functional in design than PyMC2, so I believe the idea was to have the values pass through the random variable directly to the trace, thereby relieving the random variable of having to manage its value. I can see the advantages (and disadvantages) of both approaches -- having the values together in one place (the Point) versus distributed across their respective variables.

@jsalvatier
Copy link
Member

@fonnesbeck @brandonwillard yup, that's basically the reason I didn't choose SharedVariable. In retrospect things have moved more in the direction of using shared variables since samplers replace some variables with shared variables. Its possible that it would good to make them be shared variables, but that would require some investigation.

@brandonwillard
Copy link
Contributor

Ah, ok; I think I can see what you mean. A random variable is expected to depend on other symbolic variables, and I'm not clear on what a "dependent" SharedVariable in Theano would be.

@twiecki
Copy link
Member Author

twiecki commented Jun 15, 2016

@brandonwillard Sorry being unresponsive. What's the status on this?

updates.update(
{v: (n * tt.exp(w) + u).reshape(v.tag.test_value.shape)})
{v: (n * tt.exp(w) + u).reshape(np.shape(v.tag.test_value))})
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is that required?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test_value was returning a primitive (not a numpy array) in the tests. Not sure if that's due to another issue, yet; still clearing out the big issues following the evaluation hack I just added. Will comment on all this once the tests are passing again (or sooner, if necessary).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessary, was just curious. Great to see you picking this back up!

@brandonwillard
Copy link
Contributor

Quick update: I've put in a small hack that should allow these Distribution changes to function in most cases (and, thus, provide the details necessary for addressing issues like #1069).

The basic idea is that we're fine evaluating a symbolic Theano shape object if all its inputs are constants. We can then use the evaluated shape in all the places that aren't ready for purely symbolic shape objects. At the very least it allows us to start setting up automatic [Theano] shape logic, which could make the high level interface simpler.

Some of this shape logic is demonstrated for the simplest case (univariate) here.

I added a function has_const_inputs that helps determine if all inputs are constant. Seems like a thing that might already exist in Theano, but nothing came to mind and that was simple enough to write. The function theano.tensor.get_scalar_constant_value seemed like the right direction, but it doesn't handle enough cases to work for this.
Originally, the idea was to compile a Theano function to evaluate the shape object, which could also be used when the shape has variable dependencies. The eval approach is just for simplicity, and, since I'm sure that some of the intermediate steps in constructing a proper Theano function could replace has_const_inputs, I'll revisit this Theano function approach soon.

@brandonwillard
Copy link
Contributor

I gathered all the shape logic and put it into one class (here). That should cover univariate and multivariate cases, as well as keep the code organized and contained. Next step is to go through the distributions and wire them to use that DistributionShape class.

I've also added some logic to handle "old style" distributions that only specify shape. Hopefully that will put things back on track when everything is setup.

@twiecki
Copy link
Member Author

twiecki commented Jan 6, 2017

@brandonwillard that looks like a useful abstraction. btw. will this allow dynamic shapes? CC @ferrine

@brandonwillard
Copy link
Contributor

@twiecki, it won't allow dynamic shapes, but it should get rid of the need to specify shape in cases where the shape is actually fixed/constant. More important, it leaves us in a position to start gradually enabling dynamic shapes.

@twiecki
Copy link
Member Author

twiecki commented Jan 18, 2017

@brandonwillard That would already be a huge win.

@brandonwillard
Copy link
Contributor

Looks like #2833 is doing some very related work.

@twiecki
Copy link
Member Author

twiecki commented Feb 16, 2018 via email

@brandonwillard
Copy link
Contributor

Looking closer at #2833 (and its possible source/motivation, #2406), it is addressing the same thing(s).

My first-pass general solution to the issue of symbolic shapes is here. It's self-contained and requires more work in order to be used within the Distribution class(es), but it does single-out and provide the needed shape functionality. Also, it might provide some shape-handling steps that can be generalized (e.g. to other backends).

In the meantime, I'm looking over the current state of the project, since I've been out of the loop for a while, and I'll try to follow-up soon. Also, this PR could use a description summarizing its efforts! If it isn't salvageable—relative to the current master—then perhaps I'll just submit a new PR to replace this one (and give it a description)

@twiecki
Copy link
Member Author

twiecki commented Feb 19, 2018

@brandonwillard Thanks for following up. Ideally we'd come up with a joint proposal. Perhaps @aseyboldt could read through your approach and we discuss if and what we want to take. I do like the modularization of this approach.

@twiecki
Copy link
Member Author

twiecki commented Dec 22, 2018

Closing because of age. Feel free to reopen if you want to fix this up.

@twiecki twiecki closed this Dec 22, 2018
@brandonwillard
Copy link
Contributor

@twiecki Here's a draft describing the kinds of things I was trying to do with PyMC3, how they relate to the issue(s) in this PR, and some proposals that could become PRs (picking up from this one, perhaps). I'll keep updating it as I find more of my old work—which I'll most likely spread across a couple posts—but input is more than welcome at any time.

@junpenglao
Copy link
Member

junpenglao commented Dec 25, 2018

This is pretty nice @brandonwillard ! thanks for sharing.
I remembered I have seen something about combining sympy with theano to do symbolic computation - is your work in anyway related to that?

In case you missed it: we recently published our developer guide, https://docs.pymc.io/developer_guide.html, I think some graph related explanation in your blog post could go into the developer guide as well.

@twiecki
Copy link
Member Author

twiecki commented Dec 25, 2018

@brandonwillard That is extremely cool! I'm excited about this deep thinking and I think it's a good time for some more low-level changes, as we're already changing a few things around for 3.7 (dropping python 2 support, sd->sigma renaming).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants