-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow specification of dims instead of shape #3551
Conversation
Perhaps change the getting started notebook to use this? Would be nice to have the example handy. |
What are the implications of the interaction between this and changing the shape of the model, e.g., by resetting a |
@rpgoldman It neither fixes nor breaks (as far as I know) current functionality of changing the data. So basically, it doesn't work. :-( |
As my adviser used to say, "we have reduced this to a previously unsolved problem." 😉 I take your point -- this is really a problem with |
I like that one 😆 Getting the coords is just |
|
This looks really useful! I have a model where I have done a lot of indexing (building on the ideas that @junpenglao showed in his schizophrenia study example), and I think this would make that sort of thing much easier. |
What case study is that? I've been looking for a nice little example that
we can use to demonstrate this.
rpgoldman <[email protected]> schrieb am Sa., 20. Juli 2019, 16:39:
… We can also add a pm.TidyData, that deals with translating strings in a
dataframe into integer arrays for slicing, and makes sure we still get the
right labels in the output.
This looks really useful! I have a model where I have done a lot of
indexing (building on the ideas that @junpenglao
<https://github.com/junpenglao> showed in his schizophrenia study
example), and I think this would make that sort of thing much easier.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#3551>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAOLSHN5QS26ZYGLQX5HI23QAMPSRANCNFSM4IEZRJMQ>
.
|
This is the one: |
I just realize that I have a use case for this feature! Oh and the rt.live model could actually use this too. Currently the coords there are tracked as a separate dict. (cc @twiecki @AlexAndorra @canyon289 ) |
This feature does sound interesting to have!
Le ven. 5 juin 2020 à 13:48, Michael Osthege <[email protected]> a
écrit :
… I just realize that I have a use case for this feature!
Anybody else interested in re-activating this PR?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#3551 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AHIJMTALIKG5G3SDPMS44J3RVDLRRANCNFSM4IEZRJMQ>
.
|
@aseyboldt @AlexAndorra I fixed the conflicts - let's see what Travis says. |
I don't know about you, but on my browser the tests are still pending, which is weird, since your latest commit is from 21 hours ago 🤔 |
@AlexAndorra I triggered a pipeline build (https://travis-ci.org/github/pymc-devs/pymc3/builds/695670690) but the commit IDs don't match. I don't understand it.. But please review and there'll likely be some change that will trigger them again. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very nice, thanks a lot for reviving it @michaelosthege 👏
I left a bunch of comments, as this is my first review on this PR. Tell me if anything is unclear.
Also, this will definitely need some new tests. I guess updating the Data container example NB will give us ideas about those tests 😉
Travis is weirdly still hanging on GH, but when you go to Travis page, it seems like the tests passed, so this looks a like a bug -- let's see if it disappears with the new commits 🤷♂️
Co-authored-by: Alexandre ANDORRA <[email protected]>
I just pushed a few commits, including the removal of Some questions are still unresolved:
As this is a new feature, I'd personally prefer to keep it simple & not introduce 4 different ways how coords/dims can be specified. |
Ok, just pushed my changes 🎉
I think the most important is to check the implementation in the code and the new docstrings I wrote for |
GitHub doesn't show it, but the latest Travis build is green 🎉 @AlexAndorra please mark your approval and merge :) |
@michaelosthege I approved but I can't merge yet because Travis didn't tell GH it sucessfully ran 🤦 |
A note to the next unfortunate soul who searches: the updated notebooks are in the repo, but not yet on the website! Also, thanks for this, this looks like it'll simplify my models a fair bit! 🙇♂️ |
Is there any comprehensible documentation on how to use it? The Primer on Bayesian Methods for Multilevel Modeling uses the dims argument, but never explains what it actually does. I'm new to pymc3 and would like to understand how to use dims/shape and what happens with the slicing notation on TransformedRV/FreeRV with TensorSharedVariables like in the example: coords = {"Level": ["Basement", "Floor"], "obs_id": np.arange(floor.size)}
with pm.Model(coords=coords) as pooled_model:
floor_idx = pm.Data("floor_idx", floor, dims="obs_id")
a = pm.Normal("a", 0.0, sigma=10.0, dims="Level")
theta = a[floor_idx]
sigma = pm.Exponential("sigma", 1.0)
y = pm.Normal("y", theta, sigma=sigma, observed=log_radon, dims="obs_id") What happens when dims are a tuple and when to use it? Or take the temperature example. What if the data was in long format with an unequal number of measurements for each city? And how to add a level, e.g. size (small, medium, big), climate zone, or continents using dims? |
Did you find a solution for this type of problem? |
@arvids I'm sorry our documentation is still sparse on this topic. However, the To answer your question: If the number of measurements for each city differs, that's a typical scenario for imputation. . It should work with Just a note: If your data is very big and very sparse at the same time, you may run into performance limitations because of the imputed values take some memory too. In these cases one can model the data in its long-form and use tensor indexing when constructing deterministics. |
Right now we always have to specify the shapes of random variables as numbers. If we use arviz to analyse the trace we then need to specify the dimension names and coordinates for those variables. It would be much easier if we could specify the dimension names and coordinates while building the model.
This is a WIP implementation of that idea. There are two ways to specify new dimensions and their coordinates:
pm.Data
variable with a pandas dataset. We can extract the index and columns of this and remember it for latercoords
argument to model.The stochastic volatility model from the getting started notebook could look this this:
Arviz can then look at
model.coords
andmodel.RV_dims
so that the coordinates show up in the trace:We can specify completely new coordinates like this:
TODO
dims
keyword inpm.Data
to make it more obvious that we are also declaring a new dimension?model.coords
andmodel.RV_dims
to arviz. --> That's WIP Import coords and dims from pymc3 arviz-devs/arviz#757 and will depend on this PR