-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bring back SMC
and allow prior_predictive_sampling
to return transformed values
#4769
Conversation
ee122d0
to
57e4f5a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These failing tests look systematic. Some kind of dytpe problem..
Definitely. One of the SMC tests is failing in float32 when there is a discrete variable: It happens in the The problem then is that the input can no longer be of type This could be a problem in other areas of the codebase that make use of this function. I've seen it in Edit: Possibly related to #4553 Edit2: Link to the failing test: https://github.com/pymc-devs/pymc3/runs/2821726372?check_suite_focus=true#step:7:599 |
Also, more in general, within SMC we are treating discrete variables as continuous (e.g., in the proposal). Are we comfortable with this @aloctavodia? I know that the |
57e4f5a
to
22da46a
Compare
I temporarily disabled Edit: I restricted this change to when absolutely needed, and issued an informative |
9ebfb4e
to
0ffdf22
Compare
Sorry for being late to the party. Yes I am conformable with treating discrete variables as continuous when proposing new values. |
I saw this checking the release notes in:
The converter to InferenceData ignores transformed values by default, so I find the phrasing is a bit misleading and potentially troublesome. We should probably add an argument to the converter to include transformed variables into the inferencedata otherwise we'll need to keep the dict return and the capabilities of the function will depend on the output chosen |
The plan is to revert this, see #5076 This is no longer needed, as we can use the model.initial_point to get transformed prior predictive samples when they are needed for our samplers |
I think I should probably go over issues and PRs in both pymc and arviz and make a serious cleanup of integration with InferenceData, but I don't think I'll have time for a while. We still have arguments that were more workarounds than actual arguments/fixes and should be removed as they are generally useless now (i.e. density_dist_obs in to_inferende_data, keep_size in sample_posterior_predictive), the transforms presence is also annoying: arviz-devs/arviz#1509, arviz-devs/arviz#230, and in general we can simplify the converter quite a bit now that it lives in the pymc codebase and should not need complicated logic to work with multiple pymc versions. I think we could also make pointwise log likeihood storage and posterior predictive sampling work with dask (as in my experience it is common that the model/posterior fits in memory but there are many observations and ll and pp do not, and arviz does support working with dask backed arrays, the main limitation right now is creating those dask backed arrays). Maybe other improvements are also relaatively low hanging fruit? |
SMC was broken after the refactoring because it starts with a
prior_predictive_sampling
call to set up the particles positions, expecting it to return also transformed values. I extendedprior_predictive
to return transformed values if (and only if) transformed variables are explicitly passed in the optionalvar_names
argument. For reference, inv3
transformed variables were returned by default. If anyone has a strong opinion about the old default let me know!Tests were added for this as well as for the now stale issue #4490
Depending on what your PR does, here are a few things you might want to address in the description: