Implement utility to recover marginalized variables from `MarginalModel` #285

zaxtax · 2023-12-15T12:21:49Z

This is a PR to add support for the recover_marginals method. This allows us to sample values and get access to the logps of discrete variables which we marginalized out during sampling.

Closes #286

ricardoV94

Looks great, I will try and see how we can get transforms out of the way

pymc_experimental/model/marginal_model.py

pymc_experimental/tests/model/test_marginal_model.py

pymc_experimental/model/marginal_model.py

ricardoV94 · 2023-12-15T13:18:21Z

pymc_experimental/model/marginal_model.py

+
+            rv_loglike_fn = None
+            if include_samples:
+                sample_rv_outs = pm.Categorical.dist(logit_p=joint_logps)


Are these joint_logps normalized? pm.Categorical won't do it under the hood

It will when you use logit_p, but the logits added to the inferencedata directly will still not be normalized. I think it may be more intuitive if they are but not sure.

pymc_experimental/model/marginal_model.py

ricardoV94 · 2023-12-15T13:28:06Z

pymc_experimental/model/marginal_model.py

+        if var_names is None:
+            var_names = self.marginalized_rvs
+
+        joint_logp = self.logp()


self.logp will return the logp with the variables marginalized so they won't be part of the graph. I guess that's why you have on_unused_inputs issues later? I imagine you want the original logp that does not marginalize variables so you can give them values.

That's why I was asking how you handle multiple related marginalized variables. It seems to me you have to either evaluate one at a time, conditioned on the previous marginalized variables already evaluated, or create a joint logp for all the combinations of the marginalized variables.

Here is a test that creates nested marginalized variables: https://github.com/pymc-devs/pymc-experimental/blob/8046695e600970bb30a107376281d3e477a66dd0/pymc_experimental/tests/model/test_marginal_model.py#L169-L204

They get represented as an OpFromGraph where there are more than one output RVs without values. You can raise NotImplementedError for these cases for now so that you know you're working only with independent simple marginalized variables. Still I think you don't want to work with self.logp

self._logp() skips the marginalization step, maybe that's what you need?

I chose to marginalise each discrete variable one at a time. Presumably I still need marginalise for that reason

pymc_experimental/model/marginal_model.py

ricardoV94 · 2023-12-15T21:05:43Z

pymc_experimental/model/marginal_model.py

+            self.register_rv(rv, name=rv.name)
+
+    def recover_marginals(
+        self, idata, var_names=None, include_samples=False, extend_inferencedata=True


I would perhaps default include_samples=True by default. Also maybe a different name? return_samples or just sample?

The discrete samples are often not as good for understanding the tail probabilities

This can be seen by comparing the changepoint's logps vs the discrete samples discussed on https://mc-stan.org/docs/stan-users-guide/change-point.html

I know that but I still think it doesn't hurt to include by default. logits are not something many users grok intuitively

Updated name

pymc_experimental/tests/model/test_marginal_model.py

ricardoV94

Looks pretty good, I left 2 comments.

I would a test with multiple marginalized dependent variables and after that I think this would be pretty much there (except for the transforms ofc)

pymc_experimental/model/marginal_model.py

pymc_experimental/tests/model/test_marginal_model.py

ricardoV94

This looks pretty good 😊

Just two comments plus the question of compiling a single function for when we have multiple marginalized RVs. I am not sure the speed benefits outweigh the extra complexity at this point so fine to leave as is

ricardoV94 · 2023-12-20T13:09:15Z

pymc_experimental/model/marginal_model.py

+            self.register_rv(rv, name=rv.name)
+
+    def recover_marginals(
+        self, idata, var_names=None, return_samples=False, extend_inferencedata=True


I still think we should return samples by default.

pymc_experimental/model/marginal_model.py

ricardoV94 · 2023-12-20T13:16:25Z

Ah one reason I see for why we may want to normalize the lps is that we actually don't need to evaluate the joint logp of the whole model, but only those variables that depend on the marginalized one.

In the future we may want to be more efficient and compile a logp with vars=[marginalized, *dependent_RVs] and the unnormalized lps don't make as much sense there. In contrast, the normalized lps should come out exactly the same.

zaxtax · 2023-12-20T14:31:34Z

Ah one reason I see for why we may want to normalize the lps is that we actually don't need to evaluate the joint logp of the whole model, but only those variables that depend on the marginalized one.

In the future we may want to be more efficient and compile a logp with vars=[marginalized, *dependent_RVs] and the unnormalized lps don't make as much sense there. In contrast, the normalized lps should come out exactly the same.

Yep, I'm convinced. Will make the changes

zaxtax · 2023-12-20T14:59:43Z

I think in the future we can include the optimisations where we compile the joint_logps all at once. As well as only using a logp that includes terms which contain marginalized_value.

But this should work for now

ricardoV94

Small comments, hopefully that's all on my end :)

ricardoV94 · 2023-12-20T15:42:21Z

pymc_experimental/model/marginal_model.py

+        var_names : sequence of str, optional
+            List of Observed variable names for which to compute log_likelihood. Defaults to all observed variables
+        return_samples : bool, default True
+            If True, also return samples of the marginalized variables


Docstrings are wrong. Also would be nice to add a code example as we do for other methods

ricardoV94 · 2023-12-20T15:42:52Z

pymc_experimental/model/marginal_model.py

+        idata : InferenceData
+            InferenceData with var_names added to posterior


Would be good to emphasize lps will be called lp_{varname} and be found in the posterior group, same for samples.

pymc_experimental/model/marginal_model.py

ricardoV94 · 2023-12-20T15:45:42Z

pymc_experimental/model/marginal_model.py

+                logps = np.array(logvs)
+                rv_dict["lp_" + rv.name] = log_softmax(
+                    np.reshape(
+                        logps,
+                        tuple(len(coord) for coord in stacked_dims.values()) + logps.shape[1:],
+                    ),
+                    axis=len(stacked_dims),
+                )
+                rv_dims_dict["lp_" + rv.name] = sample_dims + ("lp_" + rv.name + "_dims",)


Since this is done in both branches, move it out and write only once?

ricardoV94 · 2023-12-20T15:47:06Z

pymc_experimental/model/marginal_model.py

+                axis1=0,
+                axis2=-1,


Do you mean perhaps moveaxis(..., -1, 0)? This will fail if rv_shape is larger than 1 no?

If that was a case we should add a test as well

This adopts logic from replace_finite_discrete_marginal_subgraph is there a reason this might break and that won't?

I think you mean the finite_discrete_marginal_rv_logp and I think it's a bug there as well

pymc_experimental/model/marginal_model.py

ricardoV94 · 2023-12-20T15:51:29Z

pymc_experimental/model/marginal_model.py

+        var_names : sequence of str, optional
+            List of Observed variable names for which to compute log_likelihood. Defaults to all observed variables


This is wrong, it's a log-probability right? And the default is "all marginalized variables"

pymc_experimental/model/marginal_model.py

ricardoV94 · 2023-12-21T07:19:53Z

pymc_experimental/tests/model/test_marginal_model.py

+    """Test that marginalization works for batched random variables"""
+    with MarginalModel() as m:
+        sigma = pm.HalfNormal("sigma")
+        idx = pm.Bernoulli("idx", p=0.7, shape=(2, 2))


Here I would give a different length to each dim and check the come out correctly in the idata

I nice feature (no need to block this PR, can be a follow up issue) would be to reuse dims in the computed variables if the user specified dims for the marginalized variables

pymc_experimental/model/marginal_model.py

ricardoV94 · 2023-12-21T07:36:30Z

As a follow up we may want to standardize the signature of marginalize and recover marginals to allow passing strings or the variables in either case. Right now each is restricted to a different type which feels suboptimal

pymc_experimental/model/marginal_model.py

ricardoV94 · 2023-12-21T23:55:37Z

pymc_experimental/model/marginal_model.py

+        else:
+            var_names = {var_names}
+
+        var_names = {var if isinstance(var, str) else var.name for var in var_names}


One reason I don't like sets is they introduce randomness all over the place.

This made me realize we should allow users to pass a seed and split it for each of the compile_pymc when we are sampling the Categorical (there's a get_seeds_per_chain utility in PyMC).

But even with a seed the draws will be different depending on the order we end up creating the functions due to this set

ricardoV94 · 2023-12-21T23:59:05Z

pymc_experimental/model/marginal_model.py

+                sample_rv_outs = pymc.Categorical.dist(logit_p=joint_logps)
+                rv_loglike_fn = compile_pymc(
+                    inputs=other_values,
+                    outputs=[log_softmax(joint_logps, axis=0), sample_rv_outs],


nitpick, move the repeated log_softmax before the if/else

ricardoV94

Small confusion, otherwise everything looks ready

ricardoV94 · 2023-12-25T07:01:13Z

pymc_experimental/model/marginal_model.py

+            joint_logps = pt.moveaxis(joint_logps, 0, -1)
+
+            rv_loglike_fn = None
+            joint_logps_norm = log_softmax(joint_logps, axis=0)


Shouldn't this the be last axis now?

ricardoV94 · 2023-12-25T07:12:44Z

pymc_experimental/tests/model/test_marginal_model.py

+            axis=1,
+        )
+
+    np.testing.assert_almost_equal(


Can you also add a sanity check assert that logsumexp(lps) is close to 0?

zaxtax marked this pull request as draft December 15, 2023 12:21

zaxtax requested a review from ricardoV94 December 15, 2023 12:22

zaxtax force-pushed the sampling_discrete_variable branch from b0c58b4 to 1fc5d55 Compare December 15, 2023 12:46

ricardoV94 reviewed Dec 15, 2023

View reviewed changes

pymc_experimental/model/marginal_model.py Outdated Show resolved Hide resolved

pymc_experimental/tests/model/test_marginal_model.py Outdated Show resolved Hide resolved

ricardoV94 requested a review from jessegrabowski December 15, 2023 13:01

ricardoV94 added the enhancements New feature or request label Dec 15, 2023

ricardoV94 reviewed Dec 15, 2023

View reviewed changes

pymc_experimental/model/marginal_model.py Outdated Show resolved Hide resolved

ricardoV94 reviewed Dec 15, 2023

View reviewed changes

pymc_experimental/model/marginal_model.py Outdated Show resolved Hide resolved

zaxtax changed the title ~~Adding unmarginalize~~ Adding recover_marginals Dec 15, 2023

ricardoV94 reviewed Dec 15, 2023

View reviewed changes

pymc_experimental/model/marginal_model.py Outdated Show resolved Hide resolved

ricardoV94 reviewed Dec 15, 2023

View reviewed changes

pymc_experimental/tests/model/test_marginal_model.py Outdated Show resolved Hide resolved

ricardoV94 reviewed Dec 16, 2023

View reviewed changes

pymc_experimental/model/marginal_model.py Show resolved Hide resolved

pymc_experimental/tests/model/test_marginal_model.py Outdated Show resolved Hide resolved

ricardoV94 changed the title ~~Adding recover_marginals~~ Implement utility to recover marginalized variables from MarginalModel Dec 16, 2023

zaxtax force-pushed the sampling_discrete_variable branch 3 times, most recently from 9514f90 to e4c9db7 Compare December 18, 2023 18:35

zaxtax marked this pull request as ready for review December 19, 2023 14:11

ricardoV94 reviewed Dec 20, 2023

View reviewed changes

zaxtax force-pushed the sampling_discrete_variable branch from 2bdae64 to c21ae69 Compare December 20, 2023 15:01

ricardoV94 reviewed Dec 20, 2023

View reviewed changes

ricardoV94 reviewed Dec 21, 2023

View reviewed changes

ricardoV94 reviewed Dec 22, 2023

View reviewed changes

Adding recover_marginals utility function

c17ba4a

Add logic for dealing with batched dims

9d9daa4

zaxtax force-pushed the sampling_discrete_variable branch from 8170109 to 9d9daa4 Compare December 22, 2023 00:52

zaxtax added 3 commits December 22, 2023 01:54

Lint fix

501adf4

Add fix for sampling DiscreteUniform

0d51df5

Persist dims in recover_marginals

924aa95

ricardoV94 reviewed Dec 25, 2023

View reviewed changes

Add logsumexp tests to check probabilities are normalised

3fe73f7

ricardoV94 approved these changes Dec 25, 2023

View reviewed changes

zaxtax merged commit 4f75687 into pymc-devs:main Dec 25, 2023
6 checks passed

ricardoV94 mentioned this pull request Feb 6, 2024

Implement sampling routine to recover marginalized RVs from posterior trace #94

Closed

ricardoV94 added the marginalization label Feb 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement utility to recover marginalized variables from `MarginalModel` #285

Implement utility to recover marginalized variables from `MarginalModel` #285

zaxtax commented Dec 15, 2023 •

edited by ricardoV94

Loading

ricardoV94 left a comment

ricardoV94 Dec 15, 2023

ricardoV94 Dec 15, 2023 •

edited

Loading

ricardoV94 Dec 15, 2023 •

edited

Loading

ricardoV94 Dec 15, 2023

ricardoV94 Dec 15, 2023 •

edited

Loading

zaxtax Dec 15, 2023

ricardoV94 Dec 15, 2023

zaxtax Dec 16, 2023

ricardoV94 Dec 16, 2023

zaxtax Dec 20, 2023

ricardoV94 left a comment •

edited

Loading

ricardoV94 left a comment

ricardoV94 Dec 20, 2023

ricardoV94 commented Dec 20, 2023 •

edited

Loading

zaxtax commented Dec 20, 2023

zaxtax commented Dec 20, 2023 •

edited

Loading

ricardoV94 left a comment

ricardoV94 Dec 20, 2023

ricardoV94 Dec 20, 2023

ricardoV94 Dec 20, 2023

ricardoV94 Dec 20, 2023

ricardoV94 Dec 20, 2023

zaxtax Dec 20, 2023

ricardoV94 Dec 20, 2023

ricardoV94 Dec 20, 2023

ricardoV94 Dec 21, 2023

ricardoV94 Dec 21, 2023

ricardoV94 commented Dec 21, 2023

ricardoV94 Dec 21, 2023

ricardoV94 Dec 21, 2023

ricardoV94 left a comment

ricardoV94 Dec 25, 2023

ricardoV94 Dec 25, 2023

		idata : InferenceData
		InferenceData with var_names added to posterior

		var_names : sequence of str, optional
		List of Observed variable names for which to compute log_likelihood. Defaults to all observed variables

Implement utility to recover marginalized variables from MarginalModel #285

Implement utility to recover marginalized variables from MarginalModel #285

Conversation

zaxtax commented Dec 15, 2023 • edited by ricardoV94 Loading

ricardoV94 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ricardoV94 Dec 15, 2023 • edited Loading

Choose a reason for hiding this comment

ricardoV94 Dec 15, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ricardoV94 Dec 15, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ricardoV94 left a comment • edited Loading

Choose a reason for hiding this comment

ricardoV94 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ricardoV94 commented Dec 20, 2023 • edited Loading

zaxtax commented Dec 20, 2023

zaxtax commented Dec 20, 2023 • edited Loading

ricardoV94 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ricardoV94 commented Dec 21, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ricardoV94 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Implement utility to recover marginalized variables from `MarginalModel` #285

Implement utility to recover marginalized variables from `MarginalModel` #285

zaxtax commented Dec 15, 2023 •

edited by ricardoV94

Loading

ricardoV94 Dec 15, 2023 •

edited

Loading

ricardoV94 Dec 15, 2023 •

edited

Loading

ricardoV94 Dec 15, 2023 •

edited

Loading

ricardoV94 left a comment •

edited

Loading

ricardoV94 commented Dec 20, 2023 •

edited

Loading

zaxtax commented Dec 20, 2023 •

edited

Loading