-
Notifications
You must be signed in to change notification settings - Fork 221
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unification of submodels and distributions #2485
Comments
Thanks for being the main brains behind this proposal and for an excellent write up. I don't really have much to add, I agree with essentially everything in the OP. I'm still thinking a bit about whether we could come up with an even more elegant way of getting the return values. I think the core premise here, that |
Looks good; one simple solution to unification is: I prefer to minimise extra syntax or macro defined by Turing.jl. This is not a strict rule, but we should keep the bar high. |
This is basically what @penelopeysm mentioned as an alternative that we considered -- see @model inner()
x ~ Normal()
y ~ Bernoulli()
return "hello"
end
@model outer()
(c, retval) ~ to_distribution(inner())
return nothing
end you cannot condition on A key benefit of the proposed syntax is that it balances a few concerns:
The |
There is I proposed |
Thanks for writing this up Penny!
Not sure I quite see this. The aim of submodels is to make it so that models can easily be shared across projects and applications, right? If so, there are definitively applications where you have submodels which happen to only represent a log-likelihood, but in a more general case might represent a model where you want to capture the return values. Asking the user to refactor this into a standard Julia function would either a) require accessing internal DPPL variables, or b) rewriting the model. (a) seems non-ideal, and (b) isn't so easy if the model they use comes from a different package which the user doesn't have control over. But I guess this is technically something we've already decided to ignore after moving to
Personally, I'm a bit worried about such an infix operation. It seems a bit too elaborate? Maybe it's worth querying end-users about the syntax?
Would be surprised if we can find a good and performant solution that doens't involve a
I think the most likely solution would be:
Agreed 😬 I would also point out some of these discussion points seem somewhat independent, e.g. how to represent return- and trace-values from submodels vs. syntax for specifying submodels. Might be worth separating these to avoid getting bogged down in one or the other? |
One thing I really like about Penny's proposal is that everything that goes into a Note the pleasing analogy with Note also that this would allow us to get rid of |
But not always wanted, no? E.g. if you have a model that is nested 10 levels, you don't necessarily want to prefix all that. |
Not prefixing something which is 10 levels down feels extremely dangerous to me. This seems like the kind of feature that feels like a win at first, but which you would quickly regret using when you accidentally define the same symbol somewhere else, and now you're conditioning on the wrong thing and erroring / silently failing. If we want to make it convenient to access symbols which buried under layers of models, would a safer mechanism not be preferable? |
@mhauru @willtebbutt and I discussed submodels this evening (10 Feb). The present issue is that our support for submodels is currently halfway done - we are able to extract the return value of a submodel, but not its latent variables.
(Note that this has always been true, even with the old
@submodel
macro; TuringLang/DynamicPPL.jl#696 merely changed the syntax we used to achieve this.)(1) Overview
After a fair bit of back and forth, the summary of the interface we would like is something along these lines:
for some infix operator
{OP}
(see section 3.2 below for some possible options).Note that there are several changes with respect to the current behaviour (as of 10/2/2025):
to_submodel
if possible (I am not totally sure if this is doable)@show c.x
instead of@show var"c.x"
.Although we are collectively in favour of this interface, this is not meant to be set in stone yet, and there are several further points of discussion, which are detailed below.
(2) Motivation
Turing models in general have two types of 'useful information' that one might want to extract:
@model function ... end
itself expands into a function definition (the so-called 'model evaluation function'), this function will itself also have a return value.This return value may be constructed from the random variables' values, and in many of the DynamicPPL/Turing docs, this is indeed the case; however, this is not mandatory and in general the return value can contain arbitrary information.
With models, these two pieces of information are obtained respectively using
rand()
and function calls:Currently,
x ~ to_submodel(inner())
does not assign the random variables ininner()
tox
, but rather the return value. This means that there are several inconsistencies between the behaviour of submodels and distributions:x
is sampled by callingrand(dist)
. With a submodel on the rhs, the value ofx
is obtained by callinginner()()
.inner()
evaluated atx
. This is because the return valuex
, in general, has no relationship to the random variables contained insideinner()
, and indeed there is no guarantee that a well-defined 'distribution' of return values exists.x ~ to_submodel(inner())
, although the variables ofinner()
are added to the VarInfo and the resulting chains from sampling,x
itself is not.This proposal therefore seeks to unify the behaviour of submodels and distributions in a way that is internally consistent and thus easier for users to intuit. In particular, it is proposed that:
The syntax
lhs ~ rhs
is reserved for the results of sampling from a submodel or distribution usingrand()
. The result of sampling from a model should be some kind of data structure (a NamedTuple, struct, or dictionary) which allows for indexing. The variablelhs
(or its subvariables) should always be part of the VarInfo and it should be possible to condition on them.We adopt new syntax, in the form of
lhs ~ submodel {OP} retval
where{OP}
is an infix operator, to extract the return value of a submodel (if so desired). Because distributions do not have return values, this syntax would only be accepted when used with submodels in the middle. The{OP} retval
section may be omitted, in which case the return value is simply discarded.Running a submodel without extracting its random values (i.e. just writing
submodel {OP} retval
) should be forbidden, because in such a case, users should refactor their code to use a plain Julia function instead of a submodel.(3) Concrete steps
Decide if the general idea makes sense.
Decide on the infix operator
{OP}
. We would probably like the operator to (1) be ASCII-compatible; (2) resemble a rightwards arrow.~>
, but this is not allowed by the Julia parser.-->
>>=
is also possible, and I have a Haskell bias towards it, but it technically conflicts with right-bit-shift-and-assign.->
and=>
are probably best avoided because they are already used for anonymous functions and Pair respectively.Figure out the data structure that should be obtained when sampling from a submodel. Right now,
rand(model)
returns a NamedTuple. To me, this feels like the most natural interface to use; it 'makes sense' that ift
is a random variable in a submodel,c ~ submodel
should allow us to accessc.t
. It is possible that we may want to use a different type of data structure that retains more information (i.e. is closer to a varinfo) but still has an interface that allows field access.Figure out how to obtain this data structure when sampling from a submodel. My original proposal was to evaluate submodels with a special wrapper context, say
SubmodelContext
, which would collect sampled variables and their values in a NamedTuple as eachassume
statement was hit. (Note, the behaviour of this would be very similar toValuesAsInModelContext
.) However, it seems quite plausible that this could be obtained simply by subsetting the global varinfo.Implement this in the DynamicPPL compiler. Note that this may require special attention to e.g. operator precedence / associativity which may in turn place more restrictions on the possible operators used. Some extra abstract type machinery will likely also be needed if we plan to not wrap submodels in a new type; my suspicion is that this might actually be the hardest part of it.
Iron out the odd bits of conditioning submodels. I actually suspect that all the infrastructure necessary for this is already in place, and it's mostly a matter of making sure that writing a comprehensive set of tests to make sure that everything behaves 'as expected'.
Iron out the expected behaviour when varnames conflict, e.g. if we have
c ~ submodel()
then we should probably not allow the identifierc
to be reused on the lhs of another tilde.Write tests. And more tests. And more tests. Even with as elegant an implementation as we can come up with, my gut feeling is that there are bound to be many awkward edge cases!
Turn the contents of this issue into documentation. (I wrote it up, so the hard bit's already done 😉)
(4) Alternatives considered.
The main alternative considered was to use two different operators for extracting the random variables and the return value, plus one for extracting both, so something like:
for some infix operators
{OP1}
and{OP2}
.We also considered having a single statement
b ~ submodel
return some data structure from which the random variables could be accessed usingb.vars
and the return value withb.retval
.However, we all agreed that the main proposal here is better, because its syntax is more elegant and it also does not introduce any extra layers of indirection.
The text was updated successfully, but these errors were encountered: