Maximum-a-posterior estimate given a MultipleIndependent prior and MCMC sampler #841

hdahmou · 2023-05-10T15:43:47Z

Hello,

I am encountering an error when trying to estimate the maximum-a-posterior given an MCMCPosterior: slice_np_vectorized, and a MultipleIndependent prior: created via process_prior given a list of PyTorch distributions like torch.distributions.HalfNormal(scale=torch.ones(1)*2) given one single round

The trace of the error:

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
Cell In[34], line 1
----> 1 best = posterior.map(show_progress_bars=True, force_update=True)

File ~/.../lib/python3.9/site-packages/sbi/inference/posteriors/mcmc_posterior.py:638, in MCMCPosterior.map(self, x, num_iter, num_to_optimize, learning_rate, init_method, num_init_samples, save_best_every, show_progress_bars, force_update)
    583 def map(
    584     self,
    585     x: Optional[Tensor] = None,
   (...)
    593     force_update: bool = False,
    594 ) -> Tensor:
    595     r"""Returns the maximum-a-posteriori estimate (MAP).
    596 
    597     The method can be interrupted (Ctrl-C) when the user sees that the
   (...)
    636         The MAP estimate.
    637     """
--> 638     return super().map(
    639         x=x,
    640         num_iter=num_iter,
    641         num_to_optimize=num_to_optimize,
    642         learning_rate=learning_rate,
    643         init_method=init_method,
    644         num_init_samples=num_init_samples,
    645         save_best_every=save_best_every,
    646         show_progress_bars=show_progress_bars,
    647         force_update=force_update,
    648     )

File ~/.../lib/python3.9/site-packages/sbi/inference/posteriors/base_posterior.py:215, in NeuralPosterior.map(self, x, num_iter, num_to_optimize, learning_rate, init_method, num_init_samples, save_best_every, show_progress_bars, force_update)
    213 if self._map is None or force_update:
    214     self.potential_fn.set_x(self.default_x)
--> 215     self._map = self._calculate_map(
    216         num_iter=num_iter,
    217         num_to_optimize=num_to_optimize,
    218         learning_rate=learning_rate,
    219         init_method=init_method,
    220         num_init_samples=num_init_samples,
    221         save_best_every=save_best_every,
    222         show_progress_bars=show_progress_bars,
    223     )
    224 return self._map

File ~/.../lib/python3.9/site-packages/sbi/inference/posteriors/base_posterior.py:172, in NeuralPosterior._calculate_map(self, num_iter, num_to_optimize, learning_rate, init_method, num_init_samples, save_best_every, show_progress_bars)
    169 else:
    170     raise ValueError
--> 172 return gradient_ascent(
    173     potential_fn=self.potential_fn,
    174     inits=inits,
    175     theta_transform=self.theta_transform,
    176     num_iter=num_iter,
    177     num_to_optimize=num_to_optimize,
    178     learning_rate=learning_rate,
    179     save_best_every=save_best_every,
    180     show_progress_bars=show_progress_bars,
    181 )[0]

File ~/.../lib/python3.9/site-packages/sbi/utils/sbiutils.py:886, in gradient_ascent(potential_fn, inits, theta_transform, num_iter, num_to_optimize, learning_rate, save_best_every, show_progress_bars, interruption_note)
    879 log_probs_of_optimized = potential_fn(
    880     theta_transform.inv(optimize_inits)
    881 )
    882 best_theta_iter = optimize_inits[  # type: ignore
    883     torch.argmax(log_probs_of_optimized)
    884 ]
    885 best_log_prob_iter = potential_fn(
--> 886     theta_transform.inv(best_theta_iter)
    887 )
    888 if best_log_prob_iter > best_log_prob_overall:
    889     best_theta_overall = best_theta_iter.detach().clone()

File ~/.../lib/python3.9/site-packages/torch/distributions/transforms.py:151, in Transform.__call__(self, x)
    147 """
    148 Computes the transform `x => y`.
    149 """
    150 if self._cache_size == 0:
--> 151     return self._call(x)
    152 x_old, y_old = self._cached_x_y
    153 if x is x_old:

File ~/.../lib/python3.9/site-packages/torch/distributions/transforms.py:436, in IndependentTransform._call(self, x)
    434 if x.dim() < self.domain.event_dim:
    435     raise ValueError("Too few dimensions on input")
--> 436 return self.base_transform(x)

File ~/.../lib/python3.9/site-packages/torch/distributions/transforms.py:151, in Transform.__call__(self, x)
    147 """
    148 Computes the transform `x => y`.
    149 """
    150 if self._cache_size == 0:
--> 151     return self._call(x)
    152 x_old, y_old = self._cached_x_y
    153 if x is x_old:

File ~/.../lib/python3.9/site-packages/torch/distributions/transforms.py:1017, in CatTransform._call(self, x)
   1016 def _call(self, x):
-> 1017     assert -x.dim() <= self.dim < x.dim()
   1018     assert x.size(self.dim) == self.length
   1019     yslices = []

AssertionError:

I think that this error is related to #650 (this time with MCMC). However, I'm unable to grasp the details discussed, making it difficult for me to detect the problem in my code.

The text was updated successfully, but these errors were encountered:

michaeldeistler · 2023-05-11T08:31:36Z

Thanks for reporting, this indeed sounds like a bug. As a quick workaround, you could turn off the transformation for the .map() computation:

from sbi.inference import likelihood_estimator_based_potential, MCMCPosterior

likelihood_estimator = inference.append_simulations(theta, x).train()

potential_fn, parameter_transform = likelihood_estimator_based_potential(
    likelihood_estimator, prior, x_o
)
posterior = MCMCPosterior(
    potential_fn, proposal=prior, theta_transform=None  # instead of `parameter_transform`
)
map_estimate = posterior.set_default_x(x_o).map()

hdahmou · 2023-05-11T09:27:24Z

Thank you for your response, as you suggested I set theta_transform to None, and got a new posterior but when calling .map() I get the following error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[12], line 1
----> 1 map_estimate = posterior.map(show_progress_bars=True)

File ~/linking_game/LinkingGameModel/linking_game_env/lib/python3.9/site-packages/sbi/inference/posteriors/mcmc_posterior.py:638, in MCMCPosterior.map(self, x, num_iter, num_to_optimize, learning_rate, init_method, num_init_samples, save_best_every, show_progress_bars, force_update)
    583 def map(
    584     self,
    585     x: Optional[Tensor] = None,
   (...)
    593     force_update: bool = False,
    594 ) -> Tensor:
    595     r"""Returns the maximum-a-posteriori estimate (MAP).
    596 
    597     The method can be interrupted (Ctrl-C) when the user sees that the
   (...)
    636         The MAP estimate.
    637     """
--> 638     return super().map(
    639         x=x,
    640         num_iter=num_iter,
    641         num_to_optimize=num_to_optimize,
    642         learning_rate=learning_rate,
    643         init_method=init_method,
    644         num_init_samples=num_init_samples,
    645         save_best_every=save_best_every,
    646         show_progress_bars=show_progress_bars,
    647         force_update=force_update,
    648     )

File ~/linking_game/LinkingGameModel/linking_game_env/lib/python3.9/site-packages/sbi/inference/posteriors/base_posterior.py:215, in NeuralPosterior.map(self, x, num_iter, num_to_optimize, learning_rate, init_method, num_init_samples, save_best_every, show_progress_bars, force_update)
    213 if self._map is None or force_update:
    214     self.potential_fn.set_x(self.default_x)
--> 215     self._map = self._calculate_map(
    216         num_iter=num_iter,
    217         num_to_optimize=num_to_optimize,
    218         learning_rate=learning_rate,
    219         init_method=init_method,
    220         num_init_samples=num_init_samples,
    221         save_best_every=save_best_every,
    222         show_progress_bars=show_progress_bars,
    223     )
    224 return self._map

File ~/linking_game/LinkingGameModel/linking_game_env/lib/python3.9/site-packages/sbi/inference/posteriors/base_posterior.py:172, in NeuralPosterior._calculate_map(self, num_iter, num_to_optimize, learning_rate, init_method, num_init_samples, save_best_every, show_progress_bars)
    169 else:
    170     raise ValueError
--> 172 return gradient_ascent(
    173     potential_fn=self.potential_fn,
    174     inits=inits,
    175     theta_transform=self.theta_transform,
    176     num_iter=num_iter,
    177     num_to_optimize=num_to_optimize,
    178     learning_rate=learning_rate,
    179     save_best_every=save_best_every,
    180     show_progress_bars=show_progress_bars,
    181 )[0]

File ~/.../lib/python3.9/site-packages/sbi/utils/sbiutils.py:879, in gradient_ascent(potential_fn, inits, theta_transform, num_iter, num_to_optimize, learning_rate, save_best_every, show_progress_bars, interruption_note)
    876 with torch.no_grad():
    877     if iter_ % save_best_every == 0 or iter_ == num_iter - 1:
    878         # Evaluate the optimized locations and pick the best one.
--> 879         log_probs_of_optimized = potential_fn(
    880             theta_transform.inv(optimize_inits)
    881         )
    882         best_theta_iter = optimize_inits[  # type: ignore
    883             torch.argmax(log_probs_of_optimized)
    884         ]
    885         best_log_prob_iter = potential_fn(
    886             theta_transform.inv(best_theta_iter)
    887         )

File ~/.../lib/python3.9/site-packages/sbi/inference/potentials/likelihood_based_potential.py:99, in LikelihoodBasedPotential.__call__(self, theta, track_gradients)
     91 # Calculate likelihood over trials and in one batch.
     92 log_likelihood_trial_sum = _log_likelihoods_over_trials(
     93     x=self.x_o,
     94     theta=theta.to(self.device),
     95     net=self.likelihood_estimator,
     96     track_gradients=track_gradients,
     97 )
---> 99 return log_likelihood_trial_sum + self.prior.log_prob(theta)

File ~/.../lib/python3.9/site-packages/sbi/utils/user_input_checks_utils.py:314, in MultipleIndependent.log_prob(self, value)
    312     v = value[:, dims_covered : dims_covered + ndims]
    313     # Reshape here to ensure all returned log_probs are 2D for concatenation.
--> 314     log_probs.append(d.log_prob(v).reshape(num_samples, 1))
    315     dims_covered += ndims
    317 # Sum accross last dimension to get joint log prob over all distributions.

File ~/.../lib/python3.9/site-packages/pyro/distributions/torch_patch.py:65, in _HalfCauchy_logprob(self, value)
     62 @patch_dependency("torch.distributions.HalfCauchy.log_prob")
     63 def _HalfCauchy_logprob(self, value):
     64     if self._validate_args:
---> 65         self._validate_sample(value)
     66     value = torch.as_tensor(
     67         value, dtype=self.base_dist.scale.dtype, device=self.base_dist.scale.device
     68     )
     69     log_prob = self.base_dist.log_prob(value) + math.log(2)

File ~/.../lib/python3.9/site-packages/torch/distributions/distribution.py:294, in Distribution._validate_sample(self, value)
    292 valid = support.check(value)
    293 if not valid.all():
--> 294     raise ValueError(
    295         "Expected value argument "
    296         f"({type(value).__name__} of shape {tuple(value.shape)}) "
    297         f"to be within the support ({repr(support)}) "
    298         f"of the distribution {repr(self)}, "
    299         f"but found invalid values:\n{value}"
    300     )

ValueError: Expected value argument (Tensor of shape (100, 1)) to be within the support (GreaterThanEq(lower_bound=0.0)) of the distribution HalfCauchy(), but found invalid values:
tensor([[ 4.9898e+00],
	[ 4.7333e-01],
	...
	])

michaeldeistler · 2023-05-11T09:40:54Z

Ah damn. Can you try setting .map(num_init_samples=1) (or some other small value)?

hdahmou · 2023-05-11T09:55:55Z

Yes, it did work (consistently with up to num_init_samples=5) but doesn't this mean that the gradient descent will start with a very low number of initial points and that some calls to map will fail when the sample prawn from the posterior is outside the support of the prior distribution?

michaeldeistler · 2023-05-11T09:59:23Z

The map is found by performing gradient ascent on posterior.potential(). So yes, by using a low number of initial values you risk that all of them run into local maxima but do not actually find the global MAP.

You could also try .map(num_init_samples=10_000, num_to_optimize=1). This initially samples 10k times from the prior, sorts them according posterior.potential() and then runs gradient ascent only for the best one.

Anyways, we will have to fix this (it's def a bug) but I won't get to it today.

hdahmou · 2023-05-12T08:53:09Z

In doing so it happens that it occasionally selects values beyond the distribution's support:

Optimizing MAP estimate. Iterations: 9 / 1000. Performance in iteration 0: -1200.72 (= unnormalized log-prob)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[24], line 1
----> 1 best = posterior.map(num_init_samples=10_000, num_to_optimize=1, show_progress_bars=True, force_update=True)

File ~/.../lib/python3.9/site-packages/sbi/inference/posteriors/mcmc_posterior.py:638, in MCMCPosterior.map(self, x, num_iter, num_to_optimize, learning_rate, init_method, num_init_samples, save_best_every, show_progress_bars, force_update)
    583 def map(
    584     self,
    585     x: Optional[Tensor] = None,
   (...)
    593     force_update: bool = False,
    594 ) -> Tensor:
    595     r"""Returns the maximum-a-posteriori estimate (MAP).
    596 
    597     The method can be interrupted (Ctrl-C) when the user sees that the
   (...)
    636         The MAP estimate.
    637     """
--> 638     return super().map(
    639         x=x,
    640         num_iter=num_iter,
    641         num_to_optimize=num_to_optimize,
    642         learning_rate=learning_rate,
    643         init_method=init_method,
    644         num_init_samples=num_init_samples,
    645         save_best_every=save_best_every,
    646         show_progress_bars=show_progress_bars,
    647         force_update=force_update,
    648     )

File ~/.../lib/python3.9/site-packages/sbi/inference/posteriors/base_posterior.py:215, in NeuralPosterior.map(self, x, num_iter, num_to_optimize, learning_rate, init_method, num_init_samples, save_best_every, show_progress_bars, force_update)
    213 if self._map is None or force_update:
    214     self.potential_fn.set_x(self.default_x)
--> 215     self._map = self._calculate_map(
    216         num_iter=num_iter,
    217         num_to_optimize=num_to_optimize,
    218         learning_rate=learning_rate,
    219         init_method=init_method,
    220         num_init_samples=num_init_samples,
    221         save_best_every=save_best_every,
    222         show_progress_bars=show_progress_bars,
    223     )
    224 return self._map

File ~/.../lib/python3.9/site-packages/sbi/inference/posteriors/base_posterior.py:172, in NeuralPosterior._calculate_map(self, num_iter, num_to_optimize, learning_rate, init_method, num_init_samples, save_best_every, show_progress_bars)
    169 else:
    170     raise ValueError
--> 172 return gradient_ascent(
    173     potential_fn=self.potential_fn,
    174     inits=inits,
    175     theta_transform=self.theta_transform,
    176     num_iter=num_iter,
    177     num_to_optimize=num_to_optimize,
    178     learning_rate=learning_rate,
    179     save_best_every=save_best_every,
    180     show_progress_bars=show_progress_bars,
    181 )[0]

File ~/.../lib/python3.9/site-packages/sbi/utils/sbiutils.py:871, in gradient_ascent(potential_fn, inits, theta_transform, num_iter, num_to_optimize, learning_rate, save_best_every, show_progress_bars, interruption_note)
    869 while iter_ < num_iter:
    870     optimizer.zero_grad()
--> 871     probs = potential_fn(theta_transform.inv(optimize_inits)).squeeze()
    872     loss = -probs.sum()
    873     loss.backward()

File ~/.../lib/python3.9/site-packages/sbi/inference/potentials/likelihood_based_potential.py:99, in LikelihoodBasedPotential.__call__(self, theta, track_gradients)
     91 # Calculate likelihood over trials and in one batch.
     92 log_likelihood_trial_sum = _log_likelihoods_over_trials(
     93     x=self.x_o,
     94     theta=theta.to(self.device),
     95     net=self.likelihood_estimator,
     96     track_gradients=track_gradients,
     97 )
---> 99 return log_likelihood_trial_sum + self.prior.log_prob(theta)

File ~/.../lib/python3.9/site-packages/sbi/utils/user_input_checks_utils.py:314, in MultipleIndependent.log_prob(self, value)
    312     v = value[:, dims_covered : dims_covered + ndims]
    313     # Reshape here to ensure all returned log_probs are 2D for concatenation.
--> 314     log_probs.append(d.log_prob(v).reshape(num_samples, 1))
    315     dims_covered += ndims
    317 # Sum accross last dimension to get joint log prob over all distributions.

File ~/.../lib/python3.9/site-packages/pyro/distributions/torch_patch.py:65, in _HalfCauchy_logprob(self, value)
     62 @patch_dependency("torch.distributions.HalfCauchy.log_prob")
     63 def _HalfCauchy_logprob(self, value):
     64     if self._validate_args:
---> 65         self._validate_sample(value)
     66     value = torch.as_tensor(
     67         value, dtype=self.base_dist.scale.dtype, device=self.base_dist.scale.device
     68     )
     69     log_prob = self.base_dist.log_prob(value) + math.log(2)

File ~/.../lib/python3.9/site-packages/torch/distributions/distribution.py:294, in Distribution._validate_sample(self, value)
    292 valid = support.check(value)
    293 if not valid.all():
--> 294     raise ValueError(
    295         "Expected value argument "
    296         f"({type(value).__name__} of shape {tuple(value.shape)}) "
    297         f"to be within the support ({repr(support)}) "
    298         f"of the distribution {repr(self)}, "
    299         f"but found invalid values:\n{value}"
    300     )

ValueError: Expected value argument (Tensor of shape (1, 1)) to be within the support (GreaterThanEq(lower_bound=0.0)) of the distribution HalfCauchy(), but found invalid values:
tensor([[-0.0051]], grad_fn=)

michaeldeistler added the bug Something isn't working label May 11, 2023

This was referenced Jul 28, 2023

prepare release #863

Merged

fix: add batch dim to inv transform, add test. #867

Merged

janfb closed this as completed in #867 Aug 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Maximum-a-posterior estimate given a MultipleIndependent prior and MCMC sampler #841

Maximum-a-posterior estimate given a MultipleIndependent prior and MCMC sampler #841

hdahmou commented May 10, 2023

michaeldeistler commented May 11, 2023 •

edited

Loading

hdahmou commented May 11, 2023

michaeldeistler commented May 11, 2023

hdahmou commented May 11, 2023

michaeldeistler commented May 11, 2023

hdahmou commented May 12, 2023

Maximum-a-posterior estimate given a MultipleIndependent prior and MCMC sampler #841

Maximum-a-posterior estimate given a MultipleIndependent prior and MCMC sampler #841

Comments

hdahmou commented May 10, 2023

michaeldeistler commented May 11, 2023 • edited Loading

hdahmou commented May 11, 2023

michaeldeistler commented May 11, 2023

hdahmou commented May 11, 2023

michaeldeistler commented May 11, 2023

hdahmou commented May 12, 2023

michaeldeistler commented May 11, 2023 •

edited

Loading